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^ (54) Title: LOGIC EVENT SIMULiUlON 

OO (57) Abstract: There is provided a parallel processing mefiiod of logjc simulation comprising representing signals on aline over a 
9n time period as a bit sequence, evaluating die output of any logic gate including an evaluation of any inherent delay by a comparison 
2 between the bit sequences of its inputs to a predetermined series of bit pattenis and in which those logic gates whose ou^ts have 
2 changed over the time period are identified during the evaluation of the gate outputs as real gate changes and only those leal gate 

chan^ ate propagated to fian out gates and in which die connol of die method is carried out in an associative memory mechanism 
2 which stores in word form a history of g^ input signals by C0Qq}iling a hit list reg^ter of logic gate state changes and using a 

multiple response lesolver forming part of die associative memory mechanism which generates an address for eadi hit, and then 
Q scans and transfers the results on the hit list to an ouQwt register for subsequent use. The invention provides the segmentation of 
^ divisionof at least one of the registers or Mt lists into smaUer register hit lists to reduce co^^ Further die invention 

^ relates to a method of handling the line signal propagation by modelH^g signal delays. 
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"Logic Event Simulation" 

InfrQriHCMon 

5 The present invention is directed towards a parallel processing metiiod of logic 
simuiation comprising representing signals on a line over a time period as a bit 
sequence, evaluating the output of any logic gate including an evaluation of any 
inherent delay by a comparison between the bit sequences of its inputs to a 
predetermir^ series of bit pattems and in which those logic gates whose outputs 

10 have changed over the time period are identified during the evaluation of the gate 
outputs as real gate changes and only those real gate changes are propagated to fan 
out gates and in which the control of the method is carried out In an associative 
memory mechanism which stores in word fomi a history of gate input signals by 
compiling a hit list register of logic gate state changes and using a multiple response 

15 resolver forming part of the associative memory mechanism which generates an 
address for each hit, and then scans and transfers the results on the hit list to an 
output register for subsequent use. The output register may contain the final result 
of the simulation or may be a list of outputs to be used for subsequent fan out to 
other gates. Further, the invention is directed towards providing a parallel processor 

20 for logic event simulation (APPLES). 

Logic simulation plays an important role in the design and validation of VLSI circuits. 
As circuits increase in size and complexity, there is an ever demanding requirement 
to accelerate the processing speed of this design tool. Parallel processing has been 

25 perceived in industry as the best method to achieve this goal and numerous parallel 
processing systems have been developed. Unfortunately, large speedup figures 
have eluded these approaches. IHigher speedup figures have been achieved, but 
only by compromising the accuracy of the gate delay model employed in these 
systems. A large communication overhead due to basic passing of values between 

30 processors, elaborate measures to avoid or recover from deadlocl< and load 
balancing techniques, is the principal banier. 

The ever-expanding size of VLSI (Very l-arge Scale Integration) circuits has further 
emphasised the need for a fast and accurate means of simulating digital circuits. A 
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compromise between model accuracy and computational feasibility is found in logic 
simulation. In this simulation paradigm, signal values are discrete and may acquire 
in tlie simplest case iogic values 0 and 1. More complex transient state signal 
values are modelled using up to 9-state logic. Logic gates can be rnqdelled as ideal 
5 components with zero switching time or more realistically as electronic components 
with finite delay and switching characteristics such as inertial, pure or ambiguous 
delays. 

Due to the enormity of the computational effort for large circuits, the application of 
10 parallel processing to this problem has been explored. Unfortunately, large 
speedup perfonmance for most systems and approaches have been elusive. 

Sequential (uni-processor) logic simulation can be divided into two broad 
categories Compiled code and Event-driven simulation (Breur et al: Diagnosis and 

15 Reliable Design of Digital Systems. Computer-Science Press, New York (1976)). 
These techniques can be employed in a parallel environment by partitioning the 
circuit amongst processors. In compiled code simulation, all gates are evaluated at 
all time steps, even if they are not active. The circuit has to be levellised and only 
unit or zero delay models can be employed. Sequential circuits also pose difficulties 

20 for this type of simulation. A compiled code mechanism has been applied to several 
generations of specialised parallel hardware accelerators designed by IBM, the 
Logic Simulation Machine LSM (Howard et al: Introduction to the IBM Los Gatos 
Simulation Machine. Proc IEEE InL Conf. Computer Design: VLSI in Computers. 
(Oct 1983) 580-583). the Yorktown Simulation Engine (Pfister: The Yorktown 

25 Simulation Engine. Introduction 19"* ACM/IEEE Design Automation Conf. (June 
1982), 51-54) and the Engineering Verification Engine EVE (Dunn: IBM's 
Engineering Design System Support for VLSI Design and Verification. IEEE 
Design and Test Computers, (February 1984) 30-40 and performance figures as 
high as 2.2 billion gate evaluations/sec reported. Agrawal et al: Logic Simulation 

30 and Parallel Processing Inti Conf on Computer Aided Design (1990). have analysed 
the activity of several circuits and their results have indicated that at any time 
instant circuit activity (i.e. gates whose outputs are in transition) is typically in the 
range 1% to 0.1%. Therefore, the effective number of gate evaluations of these 
engines is likely to be smaller by a factor of a hundred or more. Speedup values 
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ranging from 6 to 13 for various compiled coded benchmark circuits have been 
observed on the shared memory MIMD Encore Multimax multiprocessor by Soule 
and Blank: Parallel Logic Simulation on General purpose machines. Proc Design 
Automation Conf, (June 1988), 166-171. A SIMD (array) version was investigated 
5 by Kravitz (Mueller-Thuns at al: Benchmarking Parallel Processing Platfomns: An 
Application Perspective. IEEE Trans on Parallel and Distributed systenr^, 4 No. 8 
(Aug 1 993) with similar results. 

The intrinsic unit delay model of compiled code simulators is overly simplistic for 
1 0 many applications. 

Some delay model limrtations of compiled code simulation have been eliminated in 
parallel event-driven techniques. These parallel algorithms are largely composed of 
two phases; a gate evaluation phase and an event-scheduling phase. The gate 

15 evaluation phase identifies gates that are changing and the scheduling phase puts 
the gates affected by these changes (the fan-out gates) into a time-ordered linked 
schedule list, detemnined by the current time and the delays of the active gates. 
Soule and Blank: Parallel Logic Simulation on General purpose machines. Proc 
Design Automation Conf. (June 1988), 166-1 71 and Mueller-Thuns et al: 

20 Benchmarking Parallel Processing Platforms: An Application Perspective. IEEE 
Trans on Parallel and Distributed systems, 4 No 8 (Aug 1993) have investigated both 
Shared and Distributed memory Synchronous event MIMD architectures. Again, 
overall performance has been disappointing the results of several benchmari^s 
executed on an 8-processor Encore Multimax and an 8-processor iPSC-Hypercube 

25 only gave speedup values ranging from 3 to 5. 

Asynchronous event simulation penmits limited processor autonomy. Causality 
constraints require occasional synchronisation between processors and rolling back 
of events. Deadlock between processors must be resolved. Chandy, Misra: 
30 Asynchronous Distributed Simulation via Sequence of parallel Computations. Comm 
ACM 24(ii) (April 1981), 198-206 and Bryant Simulation of Packet Communications 
Architecture Computer Systems. Tech report MIT-LCS-TR-188. MIT Cambridge 
(1977) have developed deadlock avoidance algorithms, while Briner: Parallel Mixed 
Ijevel Simulation of Digital Circuits Virtual Time. Ph.D. thesis. Dept of EI.Eng, Duke 
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Universrty» (1990) and Jefferson: Virtual time. ACM Trans Programming languages 
systems, (July 1985) 404-425 have explored algorithms based on deadlock recovery. 
The best speedup perfomnance figures for Shared and Distributed memory 
asynchronous MIMD systems were 8.5 for a 14-processor system and 20 for a 32- 
5 processor BBN system. 

Optimising strategies such as load balancing, circuit partitioning and distributed 
queues are necessary to realise the best speedup figures. Unfortunately, these 
mechanisms themselves contribute large Overhead communication costs for even 
10 modest sized parallel systems. Furthermore, the gate evaluation process despite 
its small granularity, incurs between 10 to 250 machine cycles per gate evaluation. 

15 The invention comprises a method and a processor for an Associated Parallel 
Processor for Logic Event Simulation; \he processor is refenred to In this 
specification as APPLES, and is specifically designed for parallel discrete event logic 
simulation and for carrying out such a parallel processing metiiod. In summary, the 
invention provides gates evaluations in memory and replaces interprocessor 

20 communication with a scan technique. Further, the scan mechanism is so anranged 
as to facilitate parallelisation and a wide variety of delay models may be used. 

Essentially, tiiere is therefore provided a parallel processing method of logical 
simulation comprising representing signals on a line over a time period as a bit 

25 sequence, evaluating the output of any logic gate including an evaluation of any 
inherent delay by a comparison between the bit sequences of its inputs to a 
predetermined series of bit patterns and in which those logic gates whose outputs 
have changed over the time period are identified during ttie evaluation of tiie gate 
outputs as real gate changes and only ttiose real gate changes are propagated to fan 

30 out gates. The control of the metiiod is carried out in an associative memory 
mechanism which stores in word form a history of gate input signals by compiling a 
hit list register of logic gate state changes and using a multiple response resolver 
forming part of the associative memory mechanism which generates an address for 
each hit, and then scans and transfers the results on tiie hit list to an output register 
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for subsequent use. 

One of the core features of the invention is the segmentation or diyjsion of at least 
one of the registers or hit lists into smaller registers or hit lists to reduce 
5 computational time. The other feature of considerable importance is the handling of 
line signal propagation by modelling signal delays. Rnally the method according to 
the invention allows simulation to be canled out over arbitrarily chosen time periods. 

Either the associated register is divided into separate smaller associative sub- 
10 registers, one type of logic gate being allocated to each associative sub-re£^sler, 
each of which associative sub-registers has con^sponding sub-registers connected 
thereto whereby gate evaluations and tests are canied out in parallel on each 
associative sub-register. 

15 Alternatively it is possible to achieve a satisfactory simulation particularly where the 
circuit being simulated is not too large by segmenting the hit list into a plurality of 
separate smaller hit lists each connected to a separate scan register in this case 
each scan register is operated in parallel to transfer the results to the output register. 
This gets over the particular computational problem in these parallel processors and 

20 speeds up the whole simulation considerably. 

Further, the invention provides a parallel processor for logic event simulation 
(APPI-ES) which essentially has an associated memory mechanism which comprises 
a plurality of separate associative sub-registers each for the storage in word form of a 
25 history of gate input signals for a specified type of logic gate. Further, there is a 
number of separate additional sub-registers associated with each associative sub- 
register whereby gate evaluations and tests can be canied out in parallel on each 
associative sub-register. 

30 In the method according to the invention, each associative sub-register is used to 
form a hit list connected to a corresponding separate scan register. 

Ideally, when there are a number of sub-registers and the number of the one type of 
logic gate exceeds a predetermined number, more than one sub-register is used. 



wooi/on98 



-6- 

Ideally, the scan registers are controlled by exception logic using an OR gate 
whereby the scan is terminated for each register on the OR gate changing stale thus 
indicating no further matches. The predetermined number will be determined by the 
computational load. 

5 

The scan can be earned out in many ways but one of the best ways of canying it out 
is by sequential counting through the hit list and when this is done, generally the 
steps are performed of:- 

1 0 checking if the bit is set indicating a hit; 

if a hit, determining the address effected by that hit; 

storing the address; 

15 

clearing ttie bit in the hit list; 

moving to the next position in the hit list; and 

20 repeating the above steps until the hit list is cleared. 

Obviously where fan out occurs subsequently more than one address will be effected. 

In one particular embodiment of the invention, there is provided such a parallel 
25 processing method of logic simulation in which each line signal to a target logic gate 
is stored as a plurality of bits each representing a delay of one time period, the 
aggregate bits representing the delay between signal output to and reception by the 
target logic gate and In which the inherent delay of each logic gate is represented in 
the same manner. The time period is arbitrarily chosen and will often be of the order 
30 of 1 nanosecond or less. The fact that the time period can be arbitrarily chosen is of 
immense importance since it is possible to simulate a circuit for a plurality of different 
time periods. Additionally the affect of the delay inherent in the transfer of line signal 
between logic gates is becoming more important as the response time of the 
components of circuits reduce. 
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In this latter emtxKiiment, each delay is stored as a delay word in an associative 
memory forming part of the associative memory mechanism in which:- 

5 the length of the delay word is ascerteuned; and 

if the delay word width exceeds the associative register word width:- 

the numt>er of integer multiples of the register word width contained within the 
1 0 delay word is calculated as a gate state; 

the gate state Is stored in a further state register; 

V 

the remainder from the calculation is stored in the associative renter with 
1 5 those delay words whose widths did not exceed the associative regisler word 

width; and 

on the count of the associative register commencing:- 

20 the state register is consulted for the delay word entered in the state register 

and the remainder is ignored for this count of the associative register; 

at the end of the count of the associative register, the state register is 
updated; and 

25 

the count continues until the remainder represents the count still reqinred. 

For carrying out the invention, an initialisation phase is carried out in which 
specified signal values are inputted, unspecified signal values are set to unknown, 
30 test templates are prepared defining the delay model for each logic gate, the input 
circuit is parsed to generate an equivalent circuit consisting of 2-input logic gates, and 
the 2Hnput logic gates are then configured. 
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With the present inventibn, multi-valued logic may be applied and in this situation, n 
bits are used to represent a signal value at any instance in time with n being any 
arbitrarily chosen logia A particularly suitable one is an B-valued logic in which 000 
represents logic 0, 1 1 1 represents logic 1 and 001 to 110 represent arbitrarily defined 
5 other signal states. 

One of the features of the invention is that the sequence of values on a lo^c gate is 
stored as a bit pattern forming a unique word in the assodative memory mechanism 
and by doing this it Is possible to store a record of all values that a logic gate has 
1 0 acquired for the units of delay of the longest delay in the circuit. 

Detailed Description of the Invention 

The invention will be more clearly understood from the following descrq>tion of 
15 embodiments thereof given by way of example only with reference to the 
accompanying drawings in which:- 

Hg. 1 illustrates the functions of blocks of the APPLE processor; 

20 

Fig. 2 illustrates the inertial delay mechanism in the APPLE system; 
Rg. 3 is an illustration of a simulated cycle; 

25 

Fig. 4 is a test search pattern; 

Fig. 5 is an illustration of the logical combination mechanism according to 
30 the invention, 

Fig. 6 illustrates components active during a gate evaluation phase, 

Fig. 7 is bit patterns for an ambiguous delay model and hazard detection, 

35 

Fig. 8 is an outline of an alternative arrangement of processors according to 
the invention; 
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Fig. 9 illustrates the structure of one processor In more detail; and 

5 Fig. 1 0 is a view similar to Rg. 1 of the altemative construction of processor. 

The essential elemental tasks for parallel logic simulation are: 

10 1 . Gate evaluation. 

2. Delay model implementation. 

3. Updating fan-out gates. 

The design framework for a specific parallel logic simulation architecture originated 
15 by identifying the essential elemental simulation operations which can be performed 
in parallel and by minimising the tasks that support these operations and which are 
totally irYtrinsic to the parallel system. 

Activities such as event scheduling and load balancing are perceived as 
20 implementation issues which need not be incorporated necessarily Into a new 
design. An important additional critique is that the design must execute directly in 
hardware as many parallel tasks as possible, as fast as possible but without limiting 
the type of delay model. 

25 The present invention, taking account of the above objectives, incorporates several 
special associative memory blocks and hardware in the APPLES architecture. 

The gate evaluation/delay model implementatbn and Update/Fan-out process will 
be explained with reference to the APPLES architecture with reference to Rg. 1. 

30 

Ref enring to Rg. 1 , the functional blocks of tiie APPLES processor are shown. The 
blocks pertinent to gate evaluation are associative anray la 1, input-value-register 
bank 2, associative array lb, test-result-register bank 4, group-result register bank 
5 and the group-test hit list 6. The group test hit list in turn feeds a multiple 
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response resolver 7 which in turn feeds a fan out memory 8 to an address register 
9 connected to the input value register banlc 2. The associative array 1 has an 
associative masli register la and input register la while the associative array lb 
has a masic register 1b and an input register lb. Similarly, the test result register 
5 bank 4 has a result active register 14 and the group result register bank 5 has a 
mask register 15 and an input register 16. Rnally, an input value register bank 17 
is provided. Apart from the associative an^ays, the group-result register bank has 
parallel search facilities. Regardless of the number of words in these structures 
can be searched in parallel in constant time. Furthemnore, the words in the input- 
10 value-register bank 17 and associative array 1b can be shifted right in parallel while 
reskJent in memory. 

A gate can be evaluated once its input wire values are known. In conventional uni- 
processor arKl parallel systems these values are stored in memory and accessed by 

15 the processor(s) when the gate is activated. In APPLES, gate signal values are 
stored in associative memory words. The succession of signal values that have 
appeared on a particular wire over a period of time are stored in a given associative 
memory word in a time ordered sequence. For instance, a binary value model could 
store in a 32-bit word, the history of wire values that have appeared over the last 32 

20 time intervals. Gate evaluation proceeds by searching in parallel for appropriate 
signal values in associative memory. Portions of the words which are irrelevant (e.g. 
only the 4 most recent bits are relevant for a 4-unit gate delay model) are masked out 
of the search by the memor/s input and mask register combination. For a given gate 
type (e.g. And, Or) and gate delay model there are requirements on the structure of 

25 the input signals to effect an output change. Each pattern search^ in associative 
memory detects those signal values that have a certain attribute of the necessary 
structure (e.g. Those signals which have gone high within the last 3 time units). 
Those wires that have all the attributes indicate active gates. The wire values are 
stored in a memory block designated associative array 1b(word-line-register bank). 

30 Only those gate types relevant to the applied search patterns are selected. This is 
accomplished by tagging a gate type to each word. These tags are held in 
associative array la. A specific gate type is activated by a parallel search of the 
designated tag in associative Arrayl a. 
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This simple evaluation mechanism implies that the wires must be identified by the 
type of gate into which they flow since different gate types have different ffiput wire 
sequences that activate them. Gates of a certain type are selected by a parallel 
search on gate type identifiers in associative array 1 a. 

5 

Each signal atbibute conresponds to a bit pattern search in memory. Since several 
attributes are normally required for an activated gate, the result of several pattern 
searches must be recorded. These searches can be considered as tests on words. 

1 0 The result of a test is either successful or not. This can be recorded as single bit in a 
corresponding word in another register held in a register banic termed the test-result- 
register banl<. Since each gate is assumed to have two inputs (inverters and multiple 
input gates are translated into their 2-input gate circuit equivalents) tests are 
combined on pairs of words in this bank. This combination mechanism is specific to 

15 a delay model and defined by the result-activator register and consists of simple 
AND or OR operation between bits in the word pairs. 

The results of each combining each word pair, the final stage of the gate evaluation 
process, are stored as a single word in another associative array, the group-result 
20 register Bank 5. Active gates will have a unique bit pattern in this bank and can be 
identified by a parallel search for this bit pattern. Successful candidates of this search 
set their bit in the 1-bit column register group-test hit list. 

The bits in each column position of every gate pair in the test-result register bank 4 
25 are combined in accordance to the logic operators defined in the result-activator 
register. The bits in each column are combined sequentially in time in order to reduce 
the number of output lines in the test-result-register bank 4. Thus, there is only one 
output line r^uired for each gate pair in the test-result register bank, instead of one 
wire for each column position. 

30 

The result of the combination of gate pairs in the test-result register bank 4 are 
written column by column into the group-result register bank 5. Only one column in 
parallel is written at a particular clock edge. This implies only one input wire to the 
group-result register bank 5 is required per gate pair in the test-result register bank. 
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This reduces the number of connections from the test-result register bank to the 
group-result register bank. 

The scan registers are independent in so iar as they can be decremented or 
5 incremented while other scan registers are disabled, however they are clocked in 
unison by one clock signal. 

The optimum number of scan registers is given by the inverse of the probabili^ of a 
hit being detected in the hit list 

10 

It Is essential that an OR operations of all bits in the Hit-list Is computed on one 
edge of a dock period to determine when all hit bits are clear and on the converse 
edge of the same clock cycle any scan register that is given access to its fan-out 
list is permitted to clear the hit bit that it has detected. The access is controlled by a 
15 wait semaphore system to ensure only one access at a time is made to each single 
ported memory. 

An altemative system consists of a multi-ported fan-out memory, consisting of 
several memory banks each of which can be simultaneously accessed. Each 
20 memory bank in the system has its own semaphore control mechanism. 

An alternative strategy has a hit bit enable the inputs of its fan-out list in the Input- 
value register. The enable connections from the hit list to the appropriate elements 
in the Input-value register bank are made prior to the commencement of the 

25 simulation and are determined by the connectivity between the gates in the circuit 
being simulated. These connections can be made by a dynamically configured 
device such as an FPGA (Field Programmable Gate Array) which can physically 
route the hit list element to its fan-out inputs. In the process all active Farvout 
elements so connected will be enabled simultaneously and updated with the same 

30 logic value in parallel. 

The control core consists of a synchronised self-regulated sequence of everrts 
kientifled in one example, the Verilog code as eO, el , e2 etc. An event corresponds 
to the completion of a major task. The self-regulation means that there is no 
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software controlling the sequence of events, although there may be software 
external to the processor which will solicit information concerning the stalus of the 
processor. Furthermore, it implies that there is no microprogramming involved In 
the design. This eliminates the need for a microprogrammed unit and increases the 
5 speed of processing. 

In the fan-out update activity controlled, for example, by e20, it is essential that the 
event that the Multiple response resolver 7 has no more hits to be detected, 
terminates this activity. There is a choice that this activity be temninated by the 
10 event that all the hit-list has been scanned. However, detection that no more hits 
exist can terminate prematurely this fan-out update procedure and leads to a faster 
execution time of this procedure. 

Some logic entities may have delays which exceed the time frame representabie in 
15 the word of associative array lb. Larger delays can be modelled by associating a 
state with a gate type. In this case a gate and its state are defined in associative 
array la. Tests are perfomned on associative array lb and when a gate with a 
given state passes some Input value critique In addition to the fan-out components 
of the gate possibly being affected, the Gate state is amended in Associative array 
20 1 a. This new state may also cause a new output value to be ascribed to the fan-out 
list of the gate. The tests that are applied are determined by the gate type and 
state. In this mechanism the fan-out list of a gate includes the normal fan-out 
inputs and the address in associative array la of the gate itself. 

25 In order to determine whether the state or the state and the fan-out gates are to be 
updated the state (a binary value) can serve as an offset into the gate's fan-out 
update data files. The state is added to the start location of each of a gates data files 
and this enables the gates nonnal fan-out list to be bypassed or not 

30 Hie interconnect between logic entities being simulated can be modelled using a 
large delay model described below. Furthermore, single wires can be modelled by 
one word instead of two in associative array 1a , associative anray lb and the test- 
result register bank 4. Branch points are modelled as separate wires permitting 
. different branch points to have different delay characteristics. 
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An efficient Implementation uses single word versions of associative array la, 
associative array lb and the test-result register bank. 

5 The APPLES gate evaluation mechanism selects gates of a certain type, applies a 
sequence of bit patterns searches (tests) to them and ascertains the active gates by 
recording the result of each pattern search and detemnlning those that have fulfilled 
all the necessary tests. This mechanism executes gate evaluation in constant 
time— the parallel search is independent of the number of words. This is an effective 
10 linear speedup for the evaluation activity. It also facilitates different delay models 
since a delay model can be defined by a set of search patterns. Further discussion of 
this is given below. 

Active gates set their bits in the column hit list A multiple response resolver scans 
15 through this list. The multiple resolver can be a single counter which Inspects the 
entire list from top to bottom which stops when it encounters a set bit and then uses 
its current value as a vector for the fan-out list of the identified active gate. This Hst 
has the addresses of the fan-out gate inputs in an input-value register iDank. The new 
logic value of the active gates are written into the appropriate word of this bank. 

20 

It then clears the bit before decrementing through the remainder of the list and 
repeating this process. All hit bits are Ored together so that when all bits are clear. 
This can be detected immediately and no further scanning need be done. 

25 Several scan registers can be used In the multiple response resolver to scan the 
column hit list in parallel. Each operates autonomously except when two or more 
registers simultaneously detect a hit; a dash has occurred. Then each scan 
register must wait until it is arbitrarily allowed to access and update its fan-out list 
Each register scans an equal size portion. The frequency of clashes depends on 

30 the probability of a hit for each scan register, typically this probability is between 
0.01 and 0.001 for digital circuits. The timing mechanism in APPLES enables only 
active gates to be identified and the multiple scan register structure provides a 
pipeline of gates to be updated for the current time interval without an explicit 
scheduling mechanism. The scheduler has been substituted by this more efficient 
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parallel scan procedure. 

When all gate types have been evaluated for the current time interval all signals are 
updated by shifting in parallel the words of the Input-value register into the 
5 conresponding words of the word-line register bank. For 8 valued logic (ue. 3 bits for 
each word in the Input-value register) this phase requires 3 machine cycles. The 
input-value register bank can be Implemented as a multi-ported memory system 
which allows several input values to be updated simultaneously provided that the 
values are located in different memory banks. Other logic values can be used. 

10 

The APPLES bit shift mechanism has made the role of a scheduler redundant 
Furthermore, it enables the gate evaluation process to be executed in memory, 
thereby avoiding the traditional Von Neumann bottleneck. Each word pair in anray 
lb is effectively a processor. Major issues which cause a large overhead in other 
1 5 parallel logic simulation are "deadlock'* and scheduling issues. 

Deadlock occurs in the Chandy-Misra algorithm due to two rules required for 
temporal correctness, an input waiting rule and an output waiting rule. Rule one is 
observed by the update mechanism of APPLES. For any time interval T| to J^u ^1 

20 words in array 1b reflect the state of wires at time T) and at the end of the 
evaluation and update process all wires have be updated to time T^. All wires have 
been incremented by the smallest timestamp, one discrete time unit Thus at the 
start of every time inten/al all gates can be evaluated with confidence that the input 
values are correct. The Output rule is imposed to ensure that a signal values arrive 

25 for processing in non-decreasing timestamp order. This is guaranteed in APPLES, 
since all signal values maintain there temporal order in each word through the shift 
operation. Unlike the Chandy-Misra algorithm deadlock is impossible as every gate 
can be evaluated at each time interval. 

30 There is no scheduler in the APPLES system. Complex modelling such as Inertial 
delays have confronted schedulers with costly (timewise) unscheduling problems. 
Gates which have been scheduled to become active need to be de-scheduled 
when input signals are found to be less than some predefined minimum duration. 
This with the nomnal scheduling tasks contributes to an onerous overhead. 
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Rg. 2 displays the equivalent mechanism in APPLES. An AND gate has two Inputs 
a and b, assume that unless signals are at least of three units duration no effect 
occurs at the output, the simulation involves only binary values 0 and 1 and each 
5 bit in Arraylb represents one time unit. Signal b is constant at value 1, while signal 
a is at logic 1 for two time units, less than the minimum time. This will be detected 
by the parallel search generated by the input and mask register combination and 
the gate will not become active. 

10 The circuit is now ready to be simulated by APPLES and is parsed to generate the 
gate type and delay model and topology information required to initialise associative 
arrays la, lb and the fan-out vector tables. There is no fimit on the number of fan- 
out gates. 

15 The APPLES processor assumes that the circuit to be simulated has been 
translated into an equivalent circuit composed solely of 2*lnput logic gates. Thus, 
every gate has two wires leading into it (an inverter has two wires from one source). 
These wires are organised as adjacent words in associative array lb 1 called a 
word set. Associative array la 1 contains identifiers from every wire indicated the 

20 type of gate and input into which the wire is connected. The identifiers are in an 
associative memory that when a particular gate evaluation test is executed, putting 
the relevant bit patterns into Input-regia and mask-regia specifies the gate type. 
All wires connected to such gates will be identified by a parallel search on 
associative arrayla and these will be used to activate the appropriate words in 

25 associative arraylb (word-line register bank). Thus, gate evaluation tests will only 
be active on the relevant word sets. 

The input-value register bank 17 contains the current input value for each wire. 
The three leftmost bits of every word in associative array 1 b are shifted from this 
30 bank in parallel when all signal values are being updated by one time unit During 
the update phase of the simulation, fan-out wires of active gates are identified and 
the corresponding words in the Input-value register bank amended. 

Simulation progresses in discrete time units. For any time inten/al, each gate type is 
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evaluated by applying tests on associative array 1b and combining and recording 
results in the neighbouring register banlcs. Regardless of the number of gates to be 
evaluated this process occupies between 10 machine cycles for the slniplest, to 20 
machine cydes for the more complex gate delay models, see Fig. 3. Once the fan- 
5 out gate inputs have been amended, all wires are time incremented through a 
parallel shift operation of 3 machine cycle duration. In general, for 2^ valued logic N 
shift operations are required to update all signal values. 

Rg. 3 illustrates a simulation cyde. In the simulation cyde, the task particularly 
10 affected by the drcuit size is that of scanning the hit list. As a drcuit grows in size 
the list and sequential scan time expand proportionately. Analogous to the 
conventional communication overhead problem, the APPLES architecture 
incorporates a scan mechanism which can effectively increase the scan rate as the 
hit list expands. Thus, there is provided a multiple scan register structure. As will 
15 be described, one of the features of the present invention is the parallelisation of 
the application of test vectors in the gate evaluation phase as will be described 
hereinafter. Similarly, Fig. 4 is a search test pattern for an AND gate. 

The series of signal values that appear on a wire over a period of discrete time units 
20 can be represented as a sequence of numbers. For example, in a binary system if a 
wire has a series of logic values, 1 ,1 ,0 applied to it at times to, i^ and tz, respectively, 
where to< ti< tg. The history of signal values on this wire can be denoted as a bit 
sequence Oil; the further left the bit position, the more recent the value appeared 
on the wire. 

25 

Different delay models involve signal values over various time intervals. In any 
model, signal values stored in a word which are irrelevant are masked out of the 
search pattern. 

30 The process of updating the signal values of a particular wire is achieved by shifting 
right by one time unit all values and positioning the current value into the leftmost 
position. Associative arraylb can shift right all its words in unison. The new current 
values are shifted into assodative an^aylb from the lnput*value register bank. 
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Referring to Fig. 4, there is illustrated the parallel search patterns for an AND gate 
transition to logic "0". 

With wire signal values represented as bit sequences in associative memory words, 
5 the task of gate evaluations can be executed as a sequence of paraBel pattern 
searches. Figure 4 depicts the situation where 8-valued logic has been employed 
and the AND gate has been arbitrarily modelled as having a 1 unit delay. 

Any gate which has any Input satisfying Tt and no(none) input satisfying T2 will 
10 transition to 0. 

Consequently, to determine rf the output of this gate is going to transition from logic 
1 to logic 0 it is necessary to know the signal values at the cunrent time and t^. 
The current values are contained in the leftmost three bits of the word set Figure 4 
15 declares the current values on the two inputs as logic 1=:'1 1 V and togic Os'OOff and 
the previous values as both logic 1 . 

To ascertain if this AND gate has an output transition to logic Oj^two simple bit 
pattern tests will suffice. If ANY current input value is logic 0 (Test T^) and NONE of 

20 the previous input values are logic 0 (Test T^), then the output will change to logic 
0. These are the only conditions for this delay model, which will effect this 
tran^ion. With associative memory any portion of a word can be active or passive 
in a search. Thus, putting '000' and '11V into the leftmost three bits of the search 
and mask registers of associative array 1b can execute test T^ Test T2 can be 

25 executed by essentially the same test on the next leftmost three bit positions. 

In general each test is applied one at a time. The result of test T; on word| is stored 
in the i* bit position of wordj in the test-result register bank 4. A '1' indicates a 
successful test outcome. For each word set, for every test it is necessary to know if 
30 ANY or BOTH or NONE of the inputs passed the particular test If the i** bits of 
wordj and wordj., m the test-result register bank are Ored together and the result of 
this operation is '1\ then at least one input in the corresponding word set passed 
the test Tj— the ANY condition test. If the result of the operation is '0' then no inputs 
passed test T, - the NONE condition test. Finally, if the i** bits are Anded together 
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and the result is '1 ' then BOTH have passed test T,. 



The result-activator register 14 combines results which are subsequently 
ascertained by the group-result register. The logical interaction is shown in Rg. 5. 

5 

The And or Or operations between the bit positions is dictated by the result 
activator register. A '0' in the i^ bit position of the result activator register performs 
an Or action on the results of test T for each word set in the test-result register 
bank and conversely a 'V an And action. Each i^ And or Or operation is enacted in 
1 0 parallel through all word set Test result register pairs. 

The results of the activity of the result activator register on each word set Test 
result register pair are saved in an associated group result register. Apart from 
retaining the results for a particular word set, the group result registers are 
15 composite elements in an associative array. This facilitates a parallel search for a 
particular result pattern and thus identifies all active gates. These gates are 
identified as hits (of the search in the group result register bank) in the group-test 
hit list 

20 Returning to the AND gate transition to logic '0' example, an AND gate will be 
identified as fulfilling the test requisites, any input passes test T^ and none passing 
testTz. if its corresponding group result register has the bit sequence *10' in the first 
two bit positions. 

25 The APPLE components involved in the gate evaluation phase and their 
sequencing are shown in Rg. 6. 

With the present invention, one of the major features of the method is the storing of 
each line signal to a target logic gate as a plurality of bits, each representing a 
30 delay of one time period. The aggregate bits will allow the signal output to and 
reception by the target logic gate to be accurately expressed. Thus, these are 
represented in the same manner as the inherent delay of each logic gate. What 
must be appreciated now is that as the speed of circuits increases, the time taken 
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to transmit a message between two logic gates can be considerable. ITius, the 
lines, as well as the logic gates, have to be considered as logic entities. 

Some logic entities may have delays which exceed the time frame representable in 
5 the word of associative array 1 b. Larger delays can be modelled by associating a 
state with a gate type. In this case a gate and its state are defined in associative 
anray la. Tests are perfonrned on associative array lb and when a gate with a 
given state passes some input value critique, in addition to the fan-out components 
of the gate possibly being affected, the Gate state is amended In Associative array 
10 la. This new state may also cause a new output value to be ascribed to tiie fan-out 
list of the gate. The tests that are applied are determined by ttie gate type and 
state. In tills mechanism Uie fan-Array 1 a of the gate itself. 

In order to determine whether the state or the state and the fen-out gates are to be 
15 updated the state( a binary value) can serve as a selector of the gate's fan-out 
update data files. The state amends the access point relative to the start location of a 
gates data files and this enables the gates normal fan-out list to be bypassed or not 

On commencement of filling a new time frame (a word in associative array 1 b), a 
20 special symbol is inserted into the left-most(most recent time) position. This symbol 
conveys the input value on the gate and serves as a marker. When the marker 
reaches the right-most position in the word, this indicates that a complete time 
frame has passed. This can be detected by the normal parallel test-pattern search 
technkjue on associative array lb (See Figure 1). 

25 

The interconnect between logic entities being simulated can be modelled using the 
large delay model described above. Furthermore, single wires can be modelled by 
one word instead of two in associative array la , associative array 1 b and tiie test- 
result register bank. Branch points are modelled as separate wires permitting 
30 different branch points to have different delay characteristics. 

In effect, what is done is each delay is stored as a delay word in an associative 
memory forming part of the associative memory mechanism. The length of the 
delay word is ascertained and rf the delay word width exceeds the associative 
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register word width, then it cannot be stored rn the register simply. Then, the 
numi^er of integer multiples of the register word width contained within the delay 
word is calculated as a gate state. This gate state is stored in a further slate 
register, in effect, the associative register or associative an^y 1a. The remainder 
5 from the calculation is stored in the associative register array 1 b with those delay 
words whose width did not exceed the associative register width as well as with 
those words who did. Then, on the count of the associative register 1 6 
commencing, the state register is consulted, that is to say, the associative register 
la. and the delay word entered into the register. The remainder is ignored for this 
1 0 count of the associative register array 1 b. At the end of the count of the 

associative register lb, the associative register 1a is updated by decrementing one 
unit If this still does not allow the count to take place, the process is repeated, if, 
however, the associative register 1a is cleared, then the count continues and the 
remainder now represents the count required. 

15 

Complex delay models such as Inertial delays require conventional sequential and 
parallel logic simulators to unschedule events when some tim.ing critique is 
violated. This expends an extremely time consuming search through an event list. 
In the present invention, inertial delays only require verification that signals are at 
20 least soma minimum time width; implementable as a single pattern search. 

An ambiguous delay is more complicated where the statistical behaviour of a gate 
conveys an uncertainty in the output. A gate output acquires an unknown value 
between some parameters (M time units) and t^ax (N time units). Using 4- 
25 valued logic, APPLES detects an initial output change to the unknown value at 
time t^, followed by the transition from unknown value to logic state '0' at time t;^, 
see Fig. 7. Hazard conditions, where both inputs simultaneously switch to 
converse values can also be detected, which is illustrated in Fig. 7. 

30 For each gate type, the evaluation time Tg^^ remains constant, typically ranging 
from 10 to 20 machine cycles. The time to scan the hit list depends on Its length 
and the number of registers employed in the scan. N scan registers can divide a 
Hit list of H locations into N equal partitions of size H/N. Assuming a location can 
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be scanned in 1 machine cycle, the scan time, Tscan is H/N cycles. Likewise it will 
be assumed that 1 cycle will be sufficient to make 1 fan-out update. 

For one scan register partition, the number of updates is (ProbhiJI-l/N. If all N 
partitions update without interference from other partitions this also represents the 
total update time for the entire system. However, while one fan-out is being 
updated, other registers continue to scan and hits in these partitions may have to 
wait and queue. The probability of this happening increases with the number of 
partitions and is given by ^Ci(ProbhiJH/N. 

A clash occurs when two or more registers simultaneously detect a hit and attempt 
to access the single ported fan-out memory, in these circumstances, a semaphore 
arbitrarily authorises waiting registers accesses to memory. The number of 
clashes during a scan is, 

No. clashes = (Prob of 2 hits per inspection) x H/N 
+ Higher order probabilities. 

(1) 

The low activity rate of circuits (typically 1%-5% of the total gate count) implies that 
higher order probabilities can be ignored. Assume a uniform random distributtoh of 
hits and let Prob^ be the probability that the register will encounter a hit on an 
inspection. Then (1) becomes. 

No. clashes = ^Ca (ProbhJ^ x H/N 

(2) 

Thus, Jh, the average total time required to scan and update the fan-out lists of a 
partition for a particular gate type is, 

Tn = Tgate^vBi + Tjcan + Tupd^fe + Tdash 

= Tgato^,+ H/N + •'C, (ProbJH/N + ''Cj (ProbJ^ x H/N 

(3) 

Since all partitions are scanned in parallel, Tn also corresponds to the processing 
time for an N scan register system. Thus, the speedup Sp=T,/Tn, of such as system 
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IS, 



H/N + ^Q(Pn)bJH/N + '*Q(ProbJ^xH/N 

(4) 

10 

Eqt (4) has been validated empirically. Predicted results are within 20% of 
observed for sample circuits C7552 and C2670 and 30% for CI 908. Non- 
uniformity of hit distribution appears to be the cause for this deviation. 

15 Differentiating w.r.t N and ignoring 2^ order and higher powers of Probw the 
optimum number of scan registers N„p&^ and con-esponding optimum speedup 
Sopftmimisgivenby, 



20 



^a^^mProb^ (5) 
-1/(2.4 xProb»«) (6) 



Thus, the optimum number of scan registers is detennined inversely by the 
probability of a hit being encountered in the Hit list In APPLES, the important 
25 processing metric is the rate at which gates can be evaluated and their fen-out lists 
updated . As the probability of a hit increases there will be a reciprocal increase in 
the rate at which gates are updated. Circuits under simulation which happen to 
exhibit higher hit rates will have a higher update rate. 

30 When the average fan-out time is not one cycle, ProbMt is multiplied by Pout, where 
Pout is the effective average fan-out time. 



35 



A higher hit rate can also be accomplished through the introduction of extra 
registers. An increase in registers increases the hit rate and the number of clashes. 
The increase halts when the hit rate equals the fan-out update rate, this occurs at 
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Nopfimum- This situation is analogous to a saturated pipeline. Further increases in the 
number of registers serves to only increase the number of clashes and waiting lists 
of those registers attempting to update fan-out lists. 



10 



Further simulations were carried out, again with a Verilog model of APPLES 
simulated 4 ISCAS-85 benchmarks. 07552(4392 gates), 02670(1736 gates), 
01908(1286 gates), 0880(622 gates) using a unit delay model. Each was 
exercised with 10 random input vectors over a time period ranging from 1,000 to 
10,000 machine cycles. Statistics were gathered as the number of scan registers 
varied from 1 to 50. The speedup relative to the numiaer of scan registers is shown 
in Table 1. 



C7552 
02670 
01 908 
C880 



No. Scan Registers 

15 



30 

12.5 19.9 

9.7 13^ 
8.4 10.8 

7.8 8.3 

Speedup 



50 
24.3 
15.9 
11.8 
9.7 



No. Scan Registers 
15 30 50 



13.6 
12.5 
11.8 
11.1 



24.3 29.6 
20.0 25.1 
17.3 20.9 
12.6 15.9 



Speedup(excl Fixed size 
Overiieads) 



15 



(a) 



(b) 



Table 1. Speedup Performance of Benchmarks 



Table (l.a) demonstrates that in general the speedup increases with the number of 
scan registers. The fixed sized overheads of gate evaluation, shifting inputs etc, 

20 tends to penalise the performance for the smaller circuits with a large number of 
registers. A more balanced analysis is obtained by factoring out all fixed time 
overheads in the simulation results. This reflects the performance of realistic, large 
circuits where the fixed overheads will be negligible to the scan time. Table (1.b) 
details the results with this correction. As expected this correction has lesser affect 

25 on the larger bench mark circuits. 
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Av. r^o. Cycles/Gate Processed 
No. Scan Registers 





1 


15 


30 


50 


C7S52 


154.6 


1 1.3 


6.4 


5.2 


C2670 


101.9 


8.0 


5.1 


3.9 


CI 908 


86.9 


6.8 


5.1 


3.9 


C880 


49.9 


4.9 


4.2 


3.6 



Table 2. Average No. of machine cycles per gate processed 

Taking the corrected simulated performance statistics, Tabie (2) displays the 
5 average number of machine cycles expended to process a gate. The APPLES 
system detects intrinsically only active gates, no futile updates or processing is 
executed. The data takes into account the scan time between hits and the time to 
update the fan-out lists. As more registers are introduced the time between hits 
reduces and the gate update rate increases. Clashes happen and active gates are 
10 effectively queued in a fan-out/update pipeline. The speedup saturates when the 
fan-out/update rate, governed by tiie size of the average fan-out list, equals the rate 
at which they enter the pipeline. 

The benchmark performance of the circuits also permits an assessment of the 
15 validity of the theory for the speedup. From the speedup measurements in 
Table1.(b) the corresponding value for f.^ was calculated using Eqt(7). This value 
representing the average fan-out update time in machine cydes, should be constant 
regardless of the number of scan registers. Furthermore, for the evaluated 
benchmarks tiie fan-out ranged from 0 to 3 gates and the probability of a hit, Prob^ 
20 was found to be 0.01 ± 5%. Within one and a half clock cydes |t is possible to 
update 2 fan-out gates, therefore depending on the circuit f.^ should be in tiie range 
0.5 to 1 .5. The calculated values f^ for are shown in Table 3. 
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No. Scan 








Registers 








15 


30 


50 




Av 






C7552 


0.41 


0.35 


0.88 




0.55 






C2670 


0.52 


0.79 


1.26 




0.86 






C1908 


0.77 


1.21 


1.32 




1.10 






C880 


0.16 


1.98 


1.54 




1.22 







Table 3. The Average Fan-out Update Time (in machine cycles) for the 
5 Benchmarks 

The values for f., are In accord with the range expected for the fan-out of these 

•N, 

circuits. The fluctuations in value across a row for fn,, where it should be constant 
are possibly due to the relatively small number of samples and size of circuits, 
10 where a small perturbation in the distribution of hits in the hit-list can affect 
significantly the speedup figures. In the case of C880, a 10% drop in speedup can 
effectively lead to a ten-fold increase in f.^. 

For comparison purposes Table 4 uses data from Banerjee: Parallel Algorithms for 
15 VLSI Computer-Aided Design. Prentice-Hall, 1994 which illustrates the speedup 
performance on various parallel architectures for circuits of similar size to those 
used in this paper. This indicates that APPLES consistently offers higher speedup. 

For comparison purposes Table 4 uses data from Banerjee: Parallel Algorithms for 
20 VLSI Computer-Aided Design. Prentice-Hall. 1994 which illustrates the ^eedup 
performance on various parallel architectures for circuits of similar size to those 
used in this paper. This indicates that APPLES consistently offers higher speedup. 
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ArchitBcture 



Synchronous 
Shared Distributed 
Memory Memory 



Asynchronous 

Shared Distributed 

Memory Memoiy 



Circuit 



Muitiplier (4990 gates) 5.0/8 / 

H-FRiSC (5060 gates) 3.7/8 / 

10 S15850 (9772 gates) / 3.2/B 

813207(7951 gates) / 3.2/8 

Adder (400 gates) / / 

QRS (1000 gates) / / 



15 



20 



5.0/8.5.8.14 
7.0/8, 8.2/14 
/ 

/ 

4.5/16, 6.5/32 
5.0/16, 7.0/32 



Speedup Performance for Various ParaUel Systems 

Notation a/b, where a s Speedup vaiue, b s No. Processors. 

Double entries denote two different systems of the same architecture 

TABLE 4 - A speedup comparison of other parallel architectures 

Tlie following from pages 28 to 54 is one example of an implementation of the 
present invention in software written in Verilog. 
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Verilog Description of APPLES 



Associative Arrayla 

Description: Each word of this array holds a bit sequence identifying the gate type input 
connection of a wire, in the corresponding position in Associative Arraylb. The input/mask 
register combination defines a gate type that will be activated for searching in Associative 
Arrayla. Words that successfully match are faidicated fai a 1-bit cohimn register. The array also 
has write capabilties. 

nodule Ary_la |Input_regla,Mas)c_regla, Adr_regla, Clock, 
Search_enblla,Write_enblla,Activ_lstla) ; 

// Input^regla* Maslc^rogXa , Adr^regla are the Input, Mask and Address registers 
ot Associative Arrayla. 

When S«aTch_enblla is set, the negative edge of Clock initiates a parallel 
search. 

Aefciv_lBtl« is a column register that indicates those vrords in Associative 
Arrayla %rtiich convared successfully with the search pattern. // 

parameter Ary_la_wdth-7 ; 
parameter Azyla_sizes'16383 ; 
integer Ary.index; 

input Clock, Search_enblla,Wr ite.enblla; 

input(Ary_la_%#dth:0) Input _regla, Masy^regla, Adr.regla; 

output lAryla_size:0] Activ_lstla; 
reg (Aryla.size : 0] Activ^lstla; 

reg ( Ary.la_wdth : 0 ) Aryla.as s_mem ( 0 : Ary la_s i ze ] , Terap_reg ; 

initial 
begin 

Sreadmerab ( "Aryla .dat * • Aryl^€iS9_mem) ; 

// Azyla.da't is the data file defining the gate and model types in the circuit.// 

for (Ary_index=:0 ; Ary.index<=Aryla.size; Ary^indexsAry^index^l) 
begin 

Activ_lstla t Ary_index) =0 ; 
end 

end 

always 8(negedge Clock) 
begin 

if (Search.enblla) 
begin ' 

for (Ary_index=0; Ary_index<=Aryla_si2e; Ary_index=Ary_index+l) 
begin 

Terop_reg=Aryla_ass_mem ( Ary^index] ; 

if ( (~Mask_regla | (Input_regla & Temp.reg) | 

(-Input_regla & -Temp^reg) ) ==8 'hf f ) 
Activ_lstla [ Ary_index] =1 ; 

else 
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Ac c iv_l s t la [ Ary_index 1=0; 

end 
end 

if (Write.enblla) Aryla_ass_mem[Adr_regla) = Input _regla; 
end 

endnodule 

Associative Arraylb 

Description: Every word in this array represents the temporal spread of signal values on a 
specific wire. The most recent values being leftmost in each word, AH words can be 
simultaneously shifted right, effecting a one unit time increment on all wires. The signal values 
are updated from a 1-bit cohunn register. The array has parallel search and read and write 
capabilities. 



module Ary.lb ( Search_reglb , Mas)c_reglb, Adr^reglb. Datain.reglb. 

Dataout:_reglb,Hit_buffr_reglb. Shft^enbl. Search_enbllb, 
Write_enbl , Read_enbl , Clock, Inpun_bi t , 
Word^line_enbl ) ; 

// Searcb.roglb, Maak^rftglb, Adr^reglb, Batmia_r»gXb^Dat«out:_reglb are the 

Search, Mask. Address, Data -in and data-out registers of Associative Arraylb 
When Searcb.anbllb is set, the negative edge of dock initiates a parallel 
search. Likewise, a read or write operation is executed on the negative edge ot 
the clock if Wrlt«_enbl or Itaad.eabl is asserted. 

A parallel search is initiated on a negative edge of the Clock if Searc^cnbllb is 
set. This search is only active on those words that are primed for searching by 
the Word_lina^eabl eoXuan regsiter. The bits in this register are set/cleared W 
Actw^lstla of Associative Array la. This effectively selects gates of a certain 
gate type and delay model. Words that match are identified by bit being set in the 
corresponding position in Hit.baf f z-^reglb. 

Words are shifted right in parallel with the leftmost bit. being taken from 

lBpttt_bit.// 



parameter Arylb^eiiusize»163a3 ; 
parameter Wlr.wrdsize ^31; 
parameter Shft.dly:=2; 
parameter Adr_reg_bits=13; 

input [Wlr.wrdsize:0) Search.reglb, Mask_reglb, Datain_reglb; 
input ( Arylb jnem_s ize : 0 1 Ii^ut_bi t , WordJ.ine_enbl ? 

input Clock; 

input Shf t_enbl, Search_enbllb,Write_enbl. Read^enbl; 



reg [VJlr_wrdsi2e:0) Temp_regl; 

reg (Wlr_wrdsize:OJ Wlr_Ass_mMn(0:Arylb_mem_si2el ; 
input [Adr_reg_bits:01 Adr_reglb; 



output lArylb_mem_si2e:0) Hit.buf f r_reglb; 
reg [Arylb.mem_size:0) Hit_buf fr^reglb; 

oucput (Wlr_wrdsize: 01 Dataout_reglb; 
reg [Wlr_wrdsize:03 Dataout_reglb; 
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integer Mem^indx? 

initial Sreadnembt "Arraylb.dat' .WLr.J^s.;neiB) ; 

//Arraylb.dat is the file which initialises all the words in Arrraylb to the 
Unknown value.// 



always ^(negedge Clock) 
begin 

if (Shft^enbl) 
begin 

for (Mex^_indx=0; MeiiL.indx<=iVrylb .jnenusize ; HenL.indxa Hencindx ♦ 1) 
begin 

Temper eg 1 = Wlr_Ass.jiiero[Menuindx) ; 
Tcanp^regl* Teiap_regl » 1; 

Teiiip.regl[Wlrjwrdsize] = Input.bit [Mem^indx] ; 
Wlr^ssuEDem(Meiiuindx] = Teinp_regl; 
end 

end 
else 

if (Search^enbllb) 
begin 

for (MenuindxsO; Men^indx<BArylb_p\enL.8ize 7 Menuindx » Henulndx ^ 1) 
begin 

if (Word^line.enbl [Men^indx] ) 
begin 

Teznp^egl = Vllr^ss.jnem [Heiix_indx] ; 

if ( (-Mask_reglb | {Search_reglb & Temp_regl) | 

(-Search^reglb & -Temp^regl) )==32 'hf f f f f f ff ) 

begin 

Hit_buf fr^reglblMenuindx] = 1; 

end 

else 

begin 

Hit_buf fr_reglblMenuindx) = 0; 
end 

end 
else 

Hit.buf fr_reglb[Menuindx] s 0; 
end 



end 
else 

if CWrite.enbl) 

Wlr.JVss_inemtAdrjreglbl » X>atain^reglb; 

else 

if (Read.enbl) 

Oataout^reglb = Wlr_Assjni»i(Adr_reglb) ; 



end 
endmodule 
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Test-result register Bank 

Description: When an search is executed on Associative Arraylb, if wordj in Arraylb matches 
the search patt^r^ then biti in wordj of the Test-result register bank wiU be set, otherwise it is 
cleared. The Result-activator register specifies the logical combination between pairs words( a 
gate's set (J mputs). The result of this omibination of word pairs is a column register (half the 
length of the niunber of word pairs). 

module Tst_rslt«reg_bank(Inp.buffr_reg,Trrjbnrt_enbl,Cona>_enbl, Clock, 

OutjDu££r^eg,Rslt_act_reg,Write..pos»Rset) ; 

// Xap-^fCr^reg is a column o£ bits describing the outcome of a search on each 
word in Arraylb. This bit column is written into a coliunn of the 'E^t-result 
register bank on the negative edge of Clock when Trr^wrt.enbl is asserted. The 
position of this coulmn is defined by Wfite.pofl. 

Word pairs are combined according to the bit sequence in Bslt.&ct.reg. A *0' in 
biti of Rslt_act_reg ORs the . i^ bits in each word pair and produces the result for 
each pair in OatJba££r_reg. This conbiziation is executed on the negative edge of 
Clock when CaaBb.mbl is asserted. Kset resets all the bits in the Test-result 
register bank.// 

parameter Trr_word_si2e=7; 
parameter Trr_^ein_size'=16383 ; 
par ame ter Trr.ou t.s izes8191; 
parameter Trr_wdth_specs2 ; 

reg(trr_word_si2e:0) Trr_array [0 :Tmmenusi2e) ? 
reg[Trr_word_size:0) Tenp^regl^ Temp.reg2; 
reg Rslt.action; 



input CTrr_menv_si ze : 0 ] Inp_buf f r_r eg ; 

input [Trr_>rord_si2e: 0] Rslt_act_reg; 

input [Trr_wdth_spec: 0) Write_pos; 

input Clock; 

input Trr_wrt_enbl ; 

input CocDb^enbl; 

input Rset; 



output (Trr_out_si2e:0) OutjDuf f r_reg; 
reg [ Trr_out_s ize : 0 ] Out^buf f r.reg ; 

integer Bank.index, i; 



always eCnegedge Clock) 
begin 

if (Trr_wrt_enbl) 
begin 

for (Bank_index=0; Bank_index<=Trr_mera_size; Baiik_index=Bank_index+l) 
begin 

Temper eg 1 =Trr_ar r ay [ Bank_index ] ; 

Temp^regl ( Wri tempos ) =Inp_buff r_r eg [Bank_index) ? 

Trr^array ( Bank_index 1 =Teinp_regl ; 

end 
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end 
else 



if (CoDb.enbl) 
begin 

R8lt_action8Rslt.act_reg(Write.j>os) ; 
for (i=0; i<ssTrr_>ior(i_5ize; i=i+l) 
begin 

for ( Bank_index=0 ; Ban]cindex<Trr_pieii^size ; Bank^index^Bank^index-i-a ) 
begin 

TCT^_regl=Trr_array[Bank_indexl ; 
Teinp_reg2=Trr_array [ Bank_iiidex<i>l ] ? 
if (R8lt.action:ssO) 

Out^f £r_reg[BanK_index/2] = (TeiRp_regX (Write_posl | 

Temp_reg2 [Wri tempos] ) ; 

else 

Out_buffr.regCBankw.index/2]4eiBp_regl[Wrire^os) & 

Teinp_reg2 [Write_pos] ; 

end 
end 

end 



else 

if (Rset) 
begin 

for ( Bank_index==0 ; BanK-index<=Trr.menusize ; BankLj^dexsBanK^index^H 
Trr_arrayCBank_index] =8'h00; 

end 



end 
endmodule 
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Group-resiilt register Bank 

Description: The result of the combination of word pairs in the Test-resalt register is writtoi as 
a column of bits into the Group-result register bank. When all combination results have been 
generated a parallel search is executed on the Group-result register to ascertain all wm^ pairs in 
Arraylb that passed all the test pattern seardies. 

module Grp_rslt_reg_bank (Grr_inp_reg,Grr jnask_reg, Grr_srch_reg, 

Clock. Srcluenbl ,Wrt_enbl , Write jos, 
Orrjiit^list) ; 

// Grr_.lap_ng is shifted as a bit column into a column of the Group*result 
register bank defined by W^ta^j^s. This column write operation is activated on 
the negative edge of Clock when Wrt.MbX is asserted. 

<5«.^"Oc*g and OKr.srcl^rag coB«K>se a search pattern enacted on the negative 
edge of Clopk when Sre^anbX is set. Pattern matches are indicated in 
0«-M.tJist. The Orr.hlt_list is also knofwn as the Group-test Bib list.// 

parameter Grr_jneii\_size=8191; 
parameter Grr^word_size=7; 
parameter Grr.wdth_spec=:2 ; 

input [Grr_|neiiusise:0] Grr_inp_reg? 

input [6rr_word.size:0} GrrjnasKjreg,Grr.3rch^reg; 

input [Grr^wdth.spec : 0 1 Wri te_pos ; 

input Clock , Srch.enbl , Wr t^enbl ; 

output (Grr_meaLJ3ize:0) Grr_hit_list; 
reg (Grr..jneiiL^ize:0] Grr_hit_list; 

reg CGrr_word_size : 0 ] Grr.array (0 :Grrjnera_sizej ; 
reg (Grr_word_size : 0 1 Te]i«>_reg; 

integer Bank^index; 

always a<negedge Clock) 

if (Wrt_enbl) . 
begin 

for (Bank^index^O; Bank_index<=Grr_mem_si2e; 

Ban3c^index=Bank_index -f 1) 

begin 

Temp_reg= Grr_array [Bank_index) ; 

Temp.reg (Wri tempos ) = Grr_inp_reg (Bank_index) ; 

Grr_array lBank_indexJ =Temp_reg; 
end 
end 

else if (Srch.enbl) 

for (Bank.indexsO ; Bank_index<=Grr_roem_s i ze ; 

Bank.index^Bank.index-i-l ) 

begin 

Texnp_reg = Grr_array[Bank_index] ; ' 

if ( (-Grr_;nask_reg | (Grr_srch_reg & Ten^_reg) | 
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( -Grr_srch_reg & -Temp^reg ) ) ==8 • hf f ) 
Grr_hit_list(Bank_index) = 1; 
else 

Grr.hit_list(Bank.index) = 0; 

end 

endinodule 

Multiple-response resolver (Version 1.0 Single Scan mode) 

Description: The Multiple-response resolver scans the Group-test Hit list ( a 1-bit column 
register). The resolver commences a scan by initialismg its counter with the top address of the 
Hit list This counter serves as an address register which facilitates reading of every Hit list bit 
If the inspected bit is set, the fan-out list of the associated gate is accessed and iq>dated 
appropriately. The bit is th« reset After reset or if the bit was already zero, the counter is 
^ decremented to point to the next address in the Hit list. The inspection process is repeated. The 
scaiming terminates either when all bits have been inspected or all bits are zero. 

nodule Multiple_res.res ( Or r_hit_list, Clock, 

Reset_ctr , End_scan_f lag, Decrmt.enbl , 
Pan^out_src_reg, Faii^out_size_reg. Rset jiit_£nd_f Ig, 
Hit^fnd^flag); 

// The Hultiple_response_resolver inspects a new bit of QrrJbit.Ilst on the 
negative edge of Clook while Deezmbjuibl is asserted*. Heaet.otr loads the 
resolver' s counter with top location of Hit list. If the current inspected bit is 
set, Bit_fa4.Cla9 is asserted and the vector and the size (no. of gates) for the 
fan-out list loaded into Fai^out.sre_jrea and Faiv.Ofaft_aise_re9, respectively. 
Sceuming halts and only reccnaiiences on the positive edge of asee_hit_fiiia.£lg which 
is externally controlled. Scanning terminates when all bits have been inspected or 
reset to zero, ^lis condition is indicated by Bna^floaB_£lag. // 

paraneter Grr_jneiiL.8ize8B191 ; 
paraaeter Vectr_tbl_adr_regL.bits=13 ; 
parameter Fanout_hdr_tbl_wdth-13 ; 
parameter Max^f an.outs? ; 
paraiaeter lnp_bnK-sizesl6383; 



input Reset_ctr,Rset_hit_fnd_flg, Clock; 
input lGrr_raenL.size:0) Grr_hit_list; 

input Decrmt.enbl; 

output End_scan_f lag; 
reg End_scan.f lag; 

output Hit.fnd^flag; 
reg Hit_fnd_flag? 

output Pan_out_src__reg; 

reg [Vectr_tbl_adr_reg_bi ts : 0 ) Fan_out_src_reg ; 

output Pan_out.size_reg; 

reg [Max.f an.out : 0 ] Pan_out_s i ze_reg ; 

regCFanoutJhdr_tbl_wdth:0) Fan_out_hdr_tbl(0:lnpjai}c_size) ; 

reg lVectr.tbl_adr_reg_bi ts : 0 J Hi t Jst_ctr ; 

reg {Nax.f an.out : 0 ) Fan_ou t_s ize_tbl { 0 : Inp^bnlc^size ] ? 
reg{Grr_mein_size:0) Hit_lst_buf fr ; 

reg Hit„fnd_ORed_f lg,Tst_Dr_bit; 
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integer Nuin_hits,Hit_dist,Sura_hit_dist, Prev_hit_lst_ctr , Avg_dist; 
ini tial $readmemh ( " Femout . dat * , Fan_out_hdr_tbl ) ; 

//The file Famnit.dat contains the vectors for the start o£ the fan-out lists for 
every gate in the circuit being simulated.// 

initial $readnieinh ( " Pansize . dat * « Faxi_out.size.tbl ) ; 

//The file Paaatsize.dat specifies the size of the fan-out list for each gate being 
simulated.// 

initial forever 
begin 

8(Reset_ctr) 

if (Reset.ctr) 
begin 

Nunuhits=0; 

Pr ev.hi t.l s t_c t r =Gr r^menus i ze ; 
Suii^hit_dist=0 ? 
Hit_lst_buf f r=Grr_hit_list ? 
Tst_or_bit= |Grr^it_list; 
$display ( -OR Chec)c=%b-,Tst_jorj3it) ; 
Hi t_l 3 t_c tr »Gr r_meiiL-S i ze ; 
End_scan_f lag=0 ; 
Hit_fnd_flag=0; 
Hit.£nd_ORedu.f lg=l ; 

$display( 'Initialisation seq executed**); 
end 
end 



always 9(negedge Clock) 
begin 

if ( (Decrmt.enbl) && (I End^scan^f lag) ) 
begin 

Hit_fnd_GRed^flg» |Hit.lst.buf fr; 
if ((Hit_lst_ctr>0) && ( Hit_fndJDRed.f Ig) ) 
begin 

if (Hit_lst_buffr|Hit_lst_ctr)=«l) 
begin 

Nuin_hitssNu2iL.hits + l^- 
Hit.distsPrev^hit.lst.ctr - Hit_lst_ctr? 
SunLJiit.distsHit.dist-t-Sun^hit^dist ; 

$di8play ( *Hit distance=%d* ,Hit.dist. •Tiine=%d» . $tiine) ; 
Prev_hit.ls t.ctrsHi t.lst_ctr ; 

Pan.out_sizejregs=FanL.out_size_tbl [Hit_lst_ctr) ; 
Fan_outjrc^eg«Fan_out_hdr_tbllHit.lst_ctr) ; 
Hit_fndLflag»l; 
Hit.lst_buffr(Hit_lst_ctr)*0; 
. end 
end 



if (<Hit_lst_ctr>0) && C! Hit.fnd^ORe<Lf Ig) ) 
begin 

End_scan_f lag=l ; 

$display(*Mo of hits in fan-out list=%d»,NunLJiits) ; 

Avg_di 3 1 = Sunuhi t_di s t / Nuin_hi t s ; 

$display( 'Average hit distances%d" , Avg^dist) ; 

end 



if {Hit_lst_ctr==0) 
begin 

if (Hit_lst_buffrlHit_lst.ctr)==l) 
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begin 

Nun\_hits=Nuin_hit3 > 1; 
Hit_dist=Prev_hit_lst.ctr-Hit_lst.ctr; 
$display ( "Hit distance=%d- .Hit.dist) ; 
Prev_hit_lst_ctr=Hit_lst_ctr; 
SuiiL.hit_distsHit.dist -(-SuiOiit^dist ; 

Fan^out_size_reg=Paii^out.8ize_tbl (Hitjst.ctr) ; 
Pan_out_src_reg«Fan_«>ut Jidr.tbl [Hit^lst.ctrl ; 
Hit_fnd_flag=l; 
end 

End_scan_f lag=l ; 

$display('No of hits in fan-out list»%d" .NuiOiits) 
Avg_di s t=Suii\Ju. t_dis t / Nunuhi ts ; 
$display( 'Average hit distance=%d* .Avg.dist) ; 
end 

Hit_lst_ctr=HitJst.ctr -1; 
end 
end 

always 0(posedge Rset:_Jiit.£ndL.f Ig) 
begin 

Hit.fna^f lag=0 ; 
end 



endmodule 
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MuItiple_Response Resolver (Version 2.0 Multiple Scan Mode) 

Description: The Multiple-respoase resolver scans the Group4est Hit list ( a 1-bit column 
register). The resolver in Multiple Scan Mode consists of several counter(scan) regstes. Each is 
assigned an equal size portion of the Group-test Hit list When the resolver is initiafeed aU scan 
registers point to the top of their respective Hit list segment The registers are syndmmised by a 
single dock. The external functionality of the Multiple Scan Mode resolver is identk^ to that of 
the Single Scan Mode version. Internally, the Multiple Scan version uses a Wait sonaphore to 
queue multiple accesses to the the fan-out lists. Registers which clash are queued arbitrarily and 
only recoDimrace scanning after gaining permission to update their fan-out lists;. Scanning 
terminates when all bits have been inspected or aU bits are zero. 

nodule Multiple_resi_res (Grr_hit_list,Clk, 

Rese t_ctr , End_scan_f lag , Decrmc^enbl , 
Pan_out_src_reg,Fan_out_size_reg,Rset_hit fnd fXc, 
Hit_fnd_f lag) ; " 

// The Multiple_response_resolv€r inspects in parallel several bits of 
®^-^^-^»^ on the negative edge of Clock while tecxmt.enbl is asserted. 
KttMt.etr loads the resolver* s scan registers with the top location of each 
respective segment of the Hit list. If any of the current inspected bits are set, 
Hi«_£ad.£la9 is asserted. The vector and the size (no. of gates) for the fan-out 
list of the segment which has been granted pemdssion. is loaded into 
Faii_.out_»re_r0g and Pan_out_Bi2e_r«g, respectively. Scanning halts for all 
registers awaiting permission. Permission is arbitrarily granted to a segment on 
the positive edge of IlBet_htt_fa4_Clg which is externally controlled. For 
registers that have not found a hit, a new bit is inspected on the negative edge 
of Clock. Scanning terminates when all bits have been inspected or reset to zero. 
This condition is indicated by Bnd.scaa.fXag. // 



parameter Gr r_mem_si ze= 8191; 
parameter Vectr_tbl_adr_reg_bits=13 ; 
parameter Panout_hdr_tbl_%#dth=13 ; 
parameter Max_f an_outs7; 
parameter Inp_bnk_si2e*16383; 

input Reset.ctr.Rset_hit_fnd_flg,Clk; 
input [Grr^em_size:0] Grr_hit_list; 

input Decrmt.enbl ; 

output End_scan_flag; 
reg End_scan_flag; 

output Hit_fnd_f lag; 
reg Hit_fnd_flag; 

output Fan„out_src,reg; 

reg ( Vectr_tbl_adr_reg_bi ts : 0 J Fan_ou t_s rc_r eg ; 

output Fan_out_size_reg; 

reglMax_f an_ouc : 0| Fan_ouc_size_reg; 

reg(Fanout_hdr_tbl_wdth: 0] Fan_out_hdr _tbl ( 0 : Inp_bnk,sizel ; 
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reglMax_fan.out:0) Fan„out_size_tbl [0 : Inp_bnk_size) ; 
reg[Grr_meirusize:0] Hit_lst_buf f r ? 

reg Hit.f nd.ORed^f Ig . Tst_or_bit , Mpl_scan_enbl ; 

integer NunUiits ,Nuiti_hits_ratio, Start^tine, Finish^tine; 

reg decrmt_enbll , decrmt_enbl2 . decnnt_enbl3 , decrnit_enbl4 , menuaccess ; 
reg decrmtZenbl 5 , decnnt.enbl 6 , dec nnt_enbl7 , decrmt_enbl 8 ; 

reg decnnt_enbl25 , decnnt_enbl2 6 , decnnt_ezibl27 , decnnt_enbl28.; 
reg decrmt.enbl 2 9 , dec nnt:_eiibl3 0 ; 

//These registers •zuibl« a segment to be aeaaaad when asserted. This program 
assumes that the list is divided into 30 equalled size segments.// 



integer Cl,c2rc3.c4,c5,c6,c7,c8; 
integer c25 , c2 6 , c2 7 » c2 8 » c29 , c3 0 , Total ; 

reg 1 Vectr_tbl_adr_regL.bits : 0 ) posl , pos2 . pos3 . pos4 . pos5 , pos6 , pos7 , posS ; 
reg [ Vec tr_tbl.adr_reg_bi ts : 0 1 pos2 5 , pos2 6 , pos27 , pos2 8 , pos29 , pos3 0 ; 
// These are the scan registexa £or each segment.// 



parameter 


upr_ 


ltl= 


149; 




parameter 


lwr_ 


ltl= 


0; 




parameter 


upr_ 


.lt2= 


299; 




parameter 


lwr_ 


.lt2= 


ISOs 




parameter 


upr_ 


.lt3= 


449; 




parameter 


lwr_ 


.lt3= 


300. 




parameter 


^pr^lt4= 


599 




parameter 


lwr_ 




450 




parameter 


upr_lt5= 


749; 


parameter 


Iwr. 




600 




parameter 


upr_ 


.lt6= 


899 




parameter 


Iwr, 


.lt6= 


750 




parameter 


upr_lt27. 


= 4049; 


parameter 


lwr_ 


.lt27 


= 3900; 


parameter 


upr_ 


at28 


= 4199; 


parameter 


Iwr. 


lt28 


= 4050; 


parameter 


upr. 


_lt29 


= 4349? 


parameter 


Iwr, 


_lt29 


= 4200; 


pariuaeter 


upr_lt30 


= 4392; 


parameter 


Ivrr 


.lt30 


= 43 


50; 



// These parameters define the upper and loMr limits of the segments o£ the 
Group- test Hit list.// 

initial 
begin 

posl=upr_ltl ; 
pos2=upr.lt2; 
pos3=upr_lt3 ; 
pos4-upr.lt4; 
pos5=upr_lt5; 
pos6=upr_lt6; 



pos27=upr_lt27 ; 
pos28=upr_lt28 ; 
pos29=upr_lt29; 
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decnnt_enbl 1=1 ; 
decrrat_enbl2=l ; 
decrnt_enbl3«l ; 
decznt_enbl4sl ; 
decxnt.enblSsl ; 
decznit_enbI6=l ; 
deczmt_enbl7sl ; 



decxmt^enbl 2 7 « 1 ; 
decnnt_enbl28=l ; 
decnnt_enbl29=l ; 
decnnt.enblB 0^1 ; 



cl=0; 
c2=0; 
c3=0; 
c4»0; 
c5=0; 

C27«0; 
c2B=0; 
c29=0; 
c30=0; 

r]iiem_access=l; 
end 

initial Sreadnenh ( "Fanout . dat ■ , Pai^out_>idr_tbl ) ; 

//The file Faaeat.date contains the vectors for the start of the fan-out lists for 
evei^ gate in the circuit being simulated.// 

initial $readmanh ( "Pansize.dat* , Fan_out_size_tbl) ; 

//The file Vanaiae.dat specifies the size qf the fan*out list for each gate being 
simulated.// 

initial forever 
begin 

»(Reset_ctr) 

if (Reset.ctr) 
begin 

.NusLJiitssO; 

Hit_Xst_buf f r=Grr Jiit^list ; 

Tst_or Jbi t= I Grr_hit_lis t ; ^ 
$display t •OR Chec)ce%b- rTst_or_bit) ; 
EncL.scan_f lag=0 ; 
Hit_fnd_f lag=0; 
Hit.f nd_ORed_f lg=l ; 

posl=upr_ltl ; 

pos2^pr_lt2 ; 

pos3=upr_lt3 ; 

pos4supr_lt4 ; 

pos5=upr_lt 5 ; 

pos6=upr_lt6; . 



pos27=upr_lt27 ; 
pos2 8=upr_l t2 8 ; 
pos2 9=upr.l t2 9 ; 
pos3 0=upr_l t3 0 ; 

decrmt.enbl Isl ; 
decrmt_ehbl2=l ; 
deczint_enbl3=l ; 
decxint_enbl4sl ; 
decrmt.enbl 5=1 ; 
decznt.enbl 6=1 ; 
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decnat_enbI27sl ; 
decrmt.eiibl2 8 1 ; 
decrmt.enbl29 s 1 ; 
decnnt_enbl30sl ; 



cl=0; 
c2=0; 
c3=0; 
c4s0; 
c5=0; 
c6-0; 

c27»0 
c28«0 
c29=0 
c30=0 



menuaccessai; 
inenL.accesssl ; 

$display( "Initialisation seq executed"); 
Start.tinesStime; 
end 
end 

always &(posedge Decrmt.enbl) 
begin 

Mpl.scan_enblBi ; 
end 

always ^(posedge Rset_hit^fnd^£lg) 
begin 

Hit_fnd_flag=:0; 

men^accesssl; 

end 

always 0 (negedge Clk) 
begin 

i£ (I £n4.scan_£lag) 
begin 

Hit_fnd_ORed_£lg= |Hit_lst_buf fr; 

if (I Hit_fnd_ORed_flg) 
begin 

En4-scan_£lag»l ; 
)^l_scan.enblsO ; . 
end 
end 

if ( (Mpl.scan_€nbl) && ( Hit_fndjORed^f Ig) ) 
begin 

if (decrmt.enbll) 
begin 

if <Hit_lst_buffr(poslJ==l) 
begin 

Hi t_ls t_bu£ f r [ posl ] =0 ; 
decnnt_enbll=0; . 
i£ (*nim_access ) 

begin 

cl=cl+l; 

Sdisplay ( "Clashl cl=%d- , d ) ; 
end 

wait (roero_access) ; 
meiii__acce55=0 ; 
Nuii^hits=N\im_hits + 1; 

Pan_out_si2e_reg=Pan„out_si2e.tbl (posl ] ; 
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Fan_out_src_reg=Fan_out_hdr_tbl [posl] ; 

Hit_fnd_flag=l; 

Hi t_lst_buf f r (posl 1 =0 ; 

if (posl >lwr_ltl) 
begin 

poslsposl-1; 
decrmt.enbllsi ; 
end 

end 

else 
begin 

if (posl >lvfr_ltl) 
begin 

posl=posl-l; 
end 
else 
decrmcenbllaO ; 

end 

end 



if (decxnt_enbl30) 
begin 

if (Hit_lstjtnif£rtpos30)==l) 
begin ^ 
Hit.lst_b«ffrlpo330)«0f 
decnn^.enbl3 OsO ; 
i£ ( ! ineiiL.acces3 ) 
begin 
c30=c30-^l; 

$display(**Cleish30 c30s%d- ,c30) ; 
end 

%iait (neiiLieccess) ; 
nen^accesssO ; 

Nuxii.hits=lluiOu.ts ^ Is 

Fan_out_size_reg=Pan_out_size.tbltpos30) ; 

Fan_out_src_reg=Fan_out_J:idr_tbl [pos30] ; 

Hit_£n4»flag=l; 

Hi t.lst_buf f r [po530 1 -0 ; 

if (pos30 >lwr_lt30) 
begin 

pOB30spos30-l ; 
decznt:_enbl3 0»1 ; 
end 

end 

else 
begin 

if (pos30 >lwr_lt30) 
begin 

pos30s=pc»s30-l; 
end 
else 

decnnt_enbl30=0 ; 

end 

end 

end 
end 



always e(posedge End_scan_f lag) 
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begin * 

F ini 3h_t iine= $ t ime ; 
end 
endxnodule 



Fan-out Generator module 

Description: When a hit has been detected in the Group-test Hit list The address mthin the scan 
register selects a vector (from the Fan-out hdr table) which locates the start of a fon-oot list for 
the current active gate. The address register of this module Is loaded with the address of the 
header of the fanH>ut list The size of this fan-out list and the updated signal value to be 
transmitted is also conveyed to the module. The module proceeds to affect aU changes in the fian- 
out lists. 



module Fan^out_gen (Pan_out_load, Pan_out_gen_f Ig, Reset_gen, lflpdate_val_in. 
Clock, Update_val_out . FaA_out_size_reg, 
Fan_out_adr_reg,Out_adr_reg) ; 

//The address in Fan^oufc.veetor.tbl of the header of the Fan-out list and the 
number of fan-put elements, are contained in Van.oat_adr_reg and FaaLjeot.siseLrea 
respectively. These are loaded on the positive edge of Faa^oute^Xoad. On ""the 
successive negative edge(s) of Cloek the. address of a fan-out wire i^ generated in 
Ouit.adjr.jreg. The end of a fan-out list is indicated when , Vai^out^gea^Clg is set. 
This flag is cleared by the positive edge of Reset^en. ' The signal value to be 
conveyed to the fan-out list is transferred to and transmitted by the module in 
Qpdate.yal.la and tF)pdate_val_(OUfe, respectively.// 

parameter Vectr_tbl_wrdjize » 13; 
parameter Vectr_tbl_size « 16383; 
parameter Inp_val_wdthB2 ; 
pairameter Max_f an.out=7 ; 
parameter Vectr_tbl_adr_size«13; 

input Fan«put_load,Reset_gen,Clock; 
input tlnp_val_wdth:0) Update_val_in; 
input I Max_ f an_ou t : 0 ] Fan_ou t _s i ze_r eg ; 
input [ Vect:r_tbl_adr_size : 0 ] Pan_out_adr_reg ; 

output Pan_out_gen_flg; 
reg Pan_outjen_f Ig; 

output llnp_val_wdth:0) Update_val_out; 
reg [ Inp_val_wdth : 0 ) Update_val_out ; 

output ( Vectr_tbl_wrd_si ze : 0 ) Ou t_adr_reg ; 
reg [Vectr_tbl_wrd_size:OJ Out_adr.reg; 



regtVectr_tbl_wrd_size:01 Fan_out_vector_tbl [0:Vectr_tbl_size] ; 
reg (Vectr_tbl_wrd_size : 0) List^pos; 
reg [ Max.f an.out ; 0 } Counter ; 

initial $readmemh ( " Fanvcr . dat • , Pan_out_vec tor_tbl ) ; 

//Faavcr.dat contains the vectors of the signals in the fan-out lists for every 
gate.// 

initial forever 
begin 
^(Reset^en) 



wo 01/01298 



-43- 

if (Reset^gen) 
begin 

end 

end 

always 9(posedge Pan.out_load) 
begin 

if (!Reset_gen) 
begin 

Counter=Paix_ou t_s ize^reg ; 
Lis t_pos =Fam^ou t_adr_r eg ; 
Update_val_ou t =Updat e_val_in ; 
Faiuout^geiuf lg»l ? 
end 
end 



always S(negedge Clock) . 
begin 

if (!Reset_gen && Pan_out_gen_£lg) 
begin 

if (Counter>0) 
begin 

Out_adr_reg«Fan_cait_vector_tbl[List_pos) ; 
List jpos»List_pos4-l ; 
Counter»Ccmnter- 1 ; 
end 
else 

Pan_out_genwf lg=0 ; 

end 
end 
endroodule 
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Input-value Bank 

Description: The bank contains the current values of all the signals in the drcoit Each location 
in the bank corresponds to a wire. Since a i¥ord at any location is 3 bits wide, up to S-vahied 
logic can be simutated (this can be augmmted by increasing the word width). The current value 
of any wire is diifled from this bank into Array.lb what time is incremented. TUs is done in 
parallel Only wure values that have changed in the current time mterval are updated. 

module Input_val_bank (Inp_val_reg, Adr_reg,Cloc}c,Shf t_eaabl,Wrt_enbX, 

Out_buf f r_reg) ; 

//tap_val_r6g contains the new value of a signaKi.e. word) in liip_val_ary. The 
location of the wire is specified in Adr_M9 and the write operation takes effect 
on the negative* edge of Clock if WM.eabX is asserted. If Shft.enbl is asserted 
then the right-most bit of every location is . shifted into the 1-bit column- 
register Oiit.bu££r_reg on the positive edge of Clock. All shifted bits are also 
written into the right -most bit of Inp_val_ary (i.e a rotation)/ thus all current 
values have been retained after the shifting out process. // 

parameter Ihp_val_wdth=2 ; 
parameter Adr_reg_bits=13 ; 
parameter Inp^hnk^s ize^^l 6 383; 
parameter Lsr7552_Inp_bnK.sizes87849 

input Clock, Shft_enbl , Wrt_enbl ; 
input I Inp_val jtfdth : 0 ] Inp_val_reg ; 
input (Adr^egjbits:0] Adr^reg; 

output (lnp_bnk_size:0] Out_buf f r_regj 
reg [Inp_bnJt_size:0) Out_buf fr_reg; 

reg [ Inp_val_wdth : 0 ) Inp_jval_ary ( 0 : Inp_bnk^size) ; 

reg tlnp_val_wdth:01 Tenp_reg; 
reg Temp^bit; 

integer Inp.ary^indx, i; 

initial $readinCTib ( " Inpval . dat " , Inp_val_ary ) ; 

//iBsrral.dat is the file which initialises the current input values of all gates 
in the siniulated circuit. All values are assigned ^Unknown' logic values except 
those primary inputs which are assigned logic ^0' or *1'.// 

always 9(]posedge Clock) 
begin 

if (Shft^enbl) 
. begin 

for (lnp_ary_indx=0 ; Inp_ary_indx<=Lsr7552_Inp_hn3c^3ize; 

Inp_ary_indx=lnp_ary_indx+1 ) 

begin 

Teirq5)_reg=Inp_val_ary [Inp_ary_indx) ; 
Ten^_bit=Tenip_reg [ 0 ] ; 

Out_buf f r_r eg ( Inp.ary_indx ) =T«rp_bit ; 
Temp_reg [1:01 =Teinp_reg ( Inp_val_wdth : 1 1 ; 
Temp^reg [ Inp.val.wdth] =Temp^it ; 
lnp_val_ary I Inp_ary_indxl =Tenp_r eg ; 
end 

$display ( • (shf t) time=%d" , $tiroe> ; 

end 



else 
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if (Wrt^enbl) 
begin 

Inp_val_ary [ Adr_reg ) = Inp.val.reg ; 
end 

end 
endmodule 

The Sequence Logic of the APPLES Processor 

parameter Nibl^^B ; 
parameter Ary_la_wdth=7 ; 
parameter Ary_lb^adr_reg_wdth=13; 
parameter Ary_la_size-16383 ; 
parameter Ary_lb_si2e=16383 ; 
parameter Eval_ptm_tbl_size=63 ; 
parameter Evalj>tm_vctr_tbl_size=31; 
parameter N\inLtst_wdth=7 ; 
parame ter Nun\„ t s t_pt rn_tbl_s i ze=3 1 ; 
parameter 6ate.jnaskla_tbl_size»31; 
parameter 6ate_inptla_tbl_size=31; 
parameter Trr_ptm^tbl^ize=31; 
parameter 6rrj?tm_tbl_size=31; 
parameter Outjva l_tbl_si z e=3 1 ; 



parameter Wlr_wrdsizes31; 
parameter Trr_wdth_spec«2 ; 
parameter Trr_word^size«7; 
parameter 6rr.jneRL.aizea8191; 
parameter Grr.wdtii^spec»2 ; 
parameter Grr_word.size»7 ; 
parameter Iu_word_size=7 ; 
parameter Iu_wdth.specs2 ; 
parameter Vectr_tbI_adr^eg=El3 ; 
parameter Max_£an^out«7; 
parameter Inp^val jwdths2 ; 
parameter Vectr_tbl_adr^sizesl6383; 

paramieter Index_regjiiidths7 ; 

parameter Nuiiv_tst.secr=12 ? //No of gates X No Transitions 
parameter NusL.tst_cnt:_wdthsE3 ; 
parameter Init_sh£t.val£3 ; 
parameter Sh£t.cnt_wdths3 ; 

wire Clock; 

wire[Ary:.la^size:0] Wrd^ln^act:iv_l3t,Trr_jMtik_inp„reg; 

wire(AryL.lb_size;0) Inval_unit_out_reg; 

wire (Grr^jneiiL^ize : 0 ) Grr_bnk_inp_reg , Grr_bnk_^i t_lst ; 

wire (Max_£an^ou t : 0 ) Mrr_uni t_f aii_ou t^size^reg; 

wire ( Vectr_tbl_adr_reg : 0 1 Mrr_unit_f an_out_src_:reg ; 

wire ( Inp^val^wdth : 0 J Fo_gen_uni t_val_ou t ; 

wire [ Vect r_tbl_adr_size : 0 1 Fo_gen_uni t_out_adr_reg ; 

reg Tst_3eq_strt; 

reg e0,el,e2,e3,e4,e5,e6,e7,e8,e9,el0,ell,el2,el3,el4, 

el5,el6,el6a,el6b,el7,el8,el9,e20,e21,e22,e23,e24,e25,e26,e27,e28,e29, 
Deact_srchla , Gate_eval_init_proc ; 

regllndex_reg.wdth: 0) Bpt.i , Epvt_i ,Ntpt_i .Gmlat_i, Gilat_i, 

Tpt_i , Gr it_i , Gnnt_i , Ov t_i ; 

reglWlr_wrdsize:0) Eval_ptm^tbllO:Evaljitm_tbl_size) ; 
reg [Wlr.wrdsize : 0 J Eval_ptrn_vctr_tbl [0 : Evalj>tm_vctr_tbl_size) ; 
reg[Num.tst.wdth:0] 'Num_tstjtm_tbl(0:Num_tst_ptrn_tbl_sizel ; 
reg { Ary_la_wdth : 0 1 Gate_niaskla_tbl [ 0 : Gate_n»askla_tbl_sizel ; 
reg(Ary_la_wdth:01 Gate_inptla_tbl rO:Gate_inptla_tbl_sizel ; 
reg[Trr_word_si2e: 0) Trr_ptm_tbl (0 :Trr_ptm_tbl_sizel ; 
reg[Grr_word_size:0) Grr_inpt_tbl [0 :Grr_ptrn_tbl_si2el ; 



wo 01/01298 " - " PC17IEOO0OOO83 

reg(Grr_worcLsize:01 Grrjnask_tbl (0:Grr_ptrn_tbl_si2e) ; 
reg ( lnp.val_wdth : 0 ) Out_val_tbl ( 0 : 0ut.val_tbl_sl2e J ; 

reg[Grr_wor<L.size:0] Grr.hnlc^earch.reg.Grr_bnHjnask_reg; 

reg[Grr.wdth.spec:0) 6rrjMikL.wrtj)os; 

reg(Trr.vpdth„spec:0] Trr JbtOcjwrt j>os ; 

reg [Trr.worcLsize : 0 1 Trr_rslt_act_reg, Trr j:slt.act_and_0 ; 
reg[Iu_word_si2es01 Inval_unit_adr_reg; 

reg [ Iu_wdth^spec : 0 1 Fo_gen^unit_vaT_in . Inval_uni t_in_r eg ; 

reg Searcluary.la . Wri te^exibl^la , Ary_lb_wrt_enbl , Wlr_bnX.searcluenbl , Shf t.ary.lb, 
Ary_lb_rd^enbl , Trr_faiiJ^wrt_enbl , Trr Jwi)^coiBb_enbl . Trr_bnk_rset , 
Grr_bnk_search_enbl , Grr_bnk_wrt_enbl , Mrr^uni t^rset , Mrr_uxiit.decrmt_enbl . 
Mrr_unit_rset^hit_f n4_f Ig, Fo_gen^unit_load, Po_gea.unit_rset7 
Inval^iini t_shf t_enbl , Inval_unit_wrt_enbl ; 

regIAry_la_wdth:0) Inp_regla, Mask^egla. Adr_regla; 

reg (Wlr.wrdsize : 0 1 Inp^eg_lb, Searcher eg.lb, Mask.reg^Xb; 

re^F ( Ary_lb_adr_r eg_wdth : 0 ) Adr_reg_lb; 

reg(Nuin_tst_cnt_wdth:0] Nuiiv.tst_cnt; 

reg[Sh£t.cnt.wdth:OJ Shft.cnt; 



Ary.la Gate.id^bnk ( Inp^egla . Mask^egla , Adr_regla , Clock, 

Searcb^ary.la,Write_enbl_la, Wrd^lxuactiv_lst) ; 

Ary_lb WrdLlxoreg_bnk(SearcJv«;reg.lb, Mask^eg.lb,Adr.reg_lb, 

lnp_reg_lb, OuOregL_lb, Trr JbnK«inp_reg, Shf t_ary_lb. 
Wlr_bnk_search«enbl , Ary^lb.wr t.enbl , Ary.lti^rd^enbl . 
Clock , Inval_uni t.out^eg , Wrd_ln_ac tiv.ls t ) ; 

Tst_r3lt_regLbank Trr^bnk (Trr_bnk_inp^eg , Trrjbnk_wrt_enbl , Trr Jsnk^coinb^enbl . 

Clock, Orr_bnk^inp_reg, Trr_rslt_act_reg. 
Trr Jbnk.wr t_pos , Trr Jwxk^r se t ) ; 

Grp_rslt_reg_bank Grr_bnk(Grr_hnH_inp_reg,GrrJbiiKjniasK-.reg» 

Grr^bnk^searclCreg, Clock* Grr Jbnl^earchuenbl , 
Grr_bhk_%(frt_enbl , Grr Jank^vnrt jos , Grr_bnlOii t.ls t ) ; 

lfultiple.res.reB Mrr_unit (GrrJbnkJiit^lst, Clock, Mrr.unit_rset. 

Mrr_tmit_endLscai|_f Ig , MrrZuni t_decniit_enbl , 
Hrr.unit:_f an_ou t_src j:eg , 
Mrr_unit.£aA.out_size_reg , 
Mrr.imit_rset_hit_fnd_flg, 
Mrr^unit^hit.fnd^f lag) ; 

Fan.out_gen Fo^genLunit { Fo_gen.\mit_load , Po_gen_unit_f Ig, Po _gen_unit_rset , 

Fo^en_unit_val_in, Clock, Fo_getL.unit_vaT_out . 
Mr r_mii t.f axuoutji ze_reg , Mrr_uni t_f aA_out_src^eg , 
Fo_gen_unit_out_adr_reg) ; 

Input_val.j3ank Inval.unit (Fo_gen_unit_val_out , Fo_gen_unit_out_adr_reg, clock, 

Inval_imit_shf t.enbl , Inval_unit_wrt_enbl . 
Inval_iinit_out_reg| ; 

Ckjgen ClK-unit (Clock) ; 



integer i,Tst_num. i terpen t; 
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initiai 
begin 

$display ( "Initialisation commencing. •) ; 
Sreadbnemb < " Ep_tbl - dat * , Eval _ptrn_tbl ) ; 
Sdisplay { " Ep_tbl . dat loaded . " ) ; 
Sreadmemh { "Epv^tbl . dat" , Eval_ptrn_vctr_tbl) ; 
Sdisplay ( " Epv_tbl . dat loaded . • ) ? 
$readmeinh ( "Ntp_tbl . dat" ,Num_tst_j)trn_tbl) ; 
$di5play ( "Ntp^tbl . dat loaded. " ) ; 
Sreadmemb { "Gila^tbl - dat " , Gate_inptla_tbl ) ; 
Sdisplay ( "Gila.tbl • dat loaded. " ) ; 
Sreadmemb ( * Gmla^tbl . dat " , Gate.jnaskla^tbl ) ; 
Sdisplay ( "^ola^tbl. dat loaded. * ) ; 
' $readmemb< "Tp_tbl.dat" ,Trr^trx^tbX) ; 
Sdisplay ( «Tp_tbl .dat loaded. * ) ; 
Sreadmemb( "Gi_tbl.dat" ^Grr^inpt.tbl) ; 
Sdisplay ( " Gi.tbl . dat loaded . " ) ; 
Sdisplay ( " 6i_tbl . dat loaded . ■ ) ; 
Sreadroenb ( ^Gn^tbl . dat " ^ Grr.jnas)c_tbl ) ; 
Sdisplay ( "Gn^tbl . dat loaded . * ) ; 
$rea(ineinb("Ov_tbl.dat".Out_val_tbl) ; 
Sdisplay ( "Ov^tbl .dat loaded. " ) ; 

Sdisplay ( "Table initialisation sequence completed"); 

Gate_eval_init jproc=l ; 
iter_cnt=0 ; 

Nunuts t_cnt=Num_tst_seq ; 
Inval_unit_shf t_enbl=0 ; 

Ept_i=8*h00? Epvt_i=8*h00; Ntpt_i=8 'hOO; 
Gmlat_i=8'h00; Gilat_i=8'h00; Tpt_i=8'h00; 
Grit_i=8*h00; Gnnc_i=8 'hOO; Ovt_i=8'h00; 
end 

always 8 (negedge Clock) . 
if (Gate_eval_init_proc) 
begin 

$display( "Gate^eval^init^proc 8 time=%d*»Stime) ; 
i terpen t= i t er^cnt-f- 1 ; 

Sdisplay ( "Iteration count»%d* * iter_cnt) ; 
Gate_eval_init_proc«0 ; 
Deac t_s rchlas 0 ; 

eO=:0; el=0; e2=0; e3s0; e4s0; e5=0; e6=0; 
e7=0; e8=0; e9=0 ; elO=0; ell=0; el2=0; el3=0; 
el4=0; elS^O; el6»0; el6a-0; el6b=:0; el7s0; 
el8=:=0; el9s0; e20:=:0; e21=0; e22:»0; 

Inp_regla=Gate_iTipt la_tblTGilat_i ) ; 
Mas)(Ljregla«Gate.jnaskla_tbl[Ginlat_i] ; 
Tst_nuinalluiiL.tst_ptm_tbllNtpt_i) ; 
Ept_i=Eval_ptm_vctr_tbl [Epvt_iJ ; 
Mrr_uni t_decrmt_enbl = 0 ; 
Tst_seq_strt=l ; 
Wlr_bnk_s earcb^enbl^ 0 ? 
Inval^unit jbinrt.exibl'sO ; 
end 

always 9<posedge Clock) 
begin 

if (Tst_3eq_strt) 
begin 

Trr_bxik_rset=l ; 
Searclv_ary_la=l ; 

eO»l ; 

Tst_seq_strt=0 ; 
end 
end 
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alvraYs ^(negedge Clock) 
begin 
if (eO) 

begin 

eOsO; 

Deact^srchlasrl ; 
end 
end 



always 9(posedge Clock) 
begin 

if (Deact^srchla) 
begin 

Trr_bn)^rset=0 ; 
Deact_srchla=0 ; 
Search_ai:y.lasO ; 
el==l; 

i=Trrjwor<L.size ; 
end 
end 



always Ofnegedge Clock) 
begin 
i£ (el) 

begin 

el«0; 

e2-l; 

end 
end 



almys 9(posedge Clocic) 
begin 
if (e2) 
begin 

Wlr Jbnk.seaz-ch_enbl»l ; 
Searcl^_:reg_lto=Bval_ptm_tbl [ Ept_i ) ; 
Maskjf eg«lb=Eval_jptrn_t:bl [ ^ t_i+l J ; 
e2=0; ' 
e3=l; 
en^ 
end 

always 0(negedge Clock) 
begin 
if le3) 

begin 

e3=0; 

e4=l; 

end 
end 



always 9(posedge Clock) 
begin 
if (e4) 
begin 

Trr_bnlc_wr t_enbl= 1 ; 
Trr.bn]c_wrt j>os=i ; 
Wlr_bnlc_search_enblsO ; 
e4aO; 
e5=l; 
end 
end 
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always 9(negedge Clock) 
begin 
if (eS) 

begin 

e5»0; 

e6sl; 

end 
end 



always 9<po5edge Clock) 
begin 
. if (e6) 
begin 

Ts t_puiii?:Tst_nuni-l ; 

i=i-l; 

e6=0; 

if. (Tst.ntim> 0) 
begin 
el=l; 

Ept_i=?pt_i*2 ; 

Sdisplay C ■ Ep t_i (updated) »%d- , E;pt_i ) ; 
Trr_bn)^wr c_enbl=0 ; 
end 
else 

begin 

Trr_lMfik_wr t.enblssO ; 
i =Trr jword^s iz e ; 

Trr_rslt^act jregsTrrj>trn_tbl (Tpt_i] ; 
Tstjium=Nuin_tst_ptrxv.tblCNtpt_iJ ; 
e7«l; 
end 

end 
end 

always 6(negedge Clock) 
begin 
if («7) 

begin 

e7s0; 

eSal; 

end 
end 



always 8(posedge Clock) 
begin 
if (e8) 
begin 

Trr_bnk_coinb.enbl=l ; 
Trr_bn)CwWrt j>os«i ; 
eS^O; 
e9=»l; 

$displayrCotnnienceinent of TRR tests for Gate type=%b*,Inp_xegla, "at 
tiiiies%d" « $ti]iie) ; 

end 
end 

always 9(negedge Clock) 
begin 
if {e9) . 

begin 

e9aQ; 

el0=l; 

end 
end 
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always 9(posedge Clock) 
begin 
if (elO) 
begin 

Trr Jsnk_coiiib_eiiblsO ; 
Grr_bnJt-Wr t_enbl=l ; 
Grr JinJ^jwrt jos=i ; 
elO»0; 
ellsl; 
end 
end 

always 8(negedge Clock) 
begin 
i£ Cell) 

begin 

ell:=0; 

U2sl; 

end 
end 

always 0(posedge Clock) 
begin 
i£ (el2) 
begin 

Ts t^me^Ts t^num- 1 ; 

i=i.l; 

el2s0 

if (Tstjiuin>0) 
begin 

Trr Jbn)^coiidbL.enbl8i ; 
Txr.JmK.wrt^oasi ; 
Grr.j9nKji«rt.enblaO ; 
' end 
else 
begin 
el3>l; 

Grr Jbnk^wrt_enblssO ; 
end 

end 
end 



always «<negedge Clock) 
begin 
if <el3) 

begin 

el3=0; 

el4*l; 

$display( -Termination of Trr tests for Gate types%b* , Inp_regla, 'at 
tiiiie3%d" . $tiiiie) ; 

end 
end 



always a (posedge Clock) 
begin 
i£ (el4) 
begin 

Grr_hnk_search_reg=Grr_inpt_tbl [Gri t_i ] ; 
Grr_bnk_niask_reg=Grr_iDask_tbl lGnnt_iT; 
Grr_bnk_searcl^_enbl5=l j 
Fo_gen_unit_rset»l ; 
el4*i0; 
el5=l; 
e 

end 
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always 9(negedg9 Cloclc) 
begin 
if (elS) 

begin 

el 5:^0; 

el6»l; 

end 
end 

always 9(posedge Clock) 
begin 
if (el6) 
begin 

Mrr_uni tjraet" 1 ; 
el6»0; 
el6a=l; 
end 
end 

always 0(negedge Clock) 
begin 
if (el6a) 
begin 

Hrr.uni t_rset.FO ; 
eI6asO; 
el6b»l ; 
end 
end 

// Propagate values to gates affected in fai^out lists. 

al%faya Ofposec^e Clock) 
begin 
if |el6b) 
begin 

Orr Jbnlc^seflurch_enblsO ; 
Mrr janit^decrmt.enblsl ; 
Fo_gen.uni t^irset =0 j 

Po_gen_imit_val_in=C>ut_val_tblCOvt_il ; 

el6b»0; 

ei7=l; 

SdisplayC" Start, of fanout list at tixnes%d* , $time) ; 
end 
end 

always 9(negedge Clock) 
begin 
if (el7) 
begin 

Fo qen_iini t_loa<a= 0 ; 
el7=0; 
el8=l; 
end 
end 



always §(posedge Clock) 
begin 
if (elS) 
begin 

if (Mrr_unit_hit_fnd_flag) 
begin 
Po_gen_unit_loadal ; 
el8=:0; 
el9=l; 
end 
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else 

if ((!Mrr^unit_hit_fncL.flag) & (Mrr.unit^end scan fig)) 
begin ~ 
el 8^0; 
e22sl; . 

Hrr_unit_decrmt_enbls:0 ; 
end 



end 

end 

always Ofnegedge Clock) 
begin 
if <el9) 
begin 
Fo_gen_unit_load=0 ; 
Inval_unit_wr t_enblsl ; 
Mrr_unit_rset_hi t_f n4_f lg=0 ; 
eX9»07 
e20=l; 
end 
end 

always »(posedge Clock) 
begin 

if (e20) 
begin 

if ( 1 Po_gen_\mit_flg ) 
begin 

if (I Mrr_\init_end^scan fig) 
begin ~ 

Mrr_unit_rset3it_f n4«f lg=l ; 

Inval_unit_wrt_enblaiO ; 

e20=0; 

e21«l; 
end 

else 

begin 

lnval_\init_wrt_enbl=0 ; 
e20=0; 
e22=sl; 
end 
end 
end 

end 

always ^(negedge Clock) 
begin 
if <e2l) 
begin 
el8=Ij 
e21»0; 
end 
end 

always d(negedge Clock) 
begin 
if (e22) 
begin 

e22=0; 
e23=l; 

Epvt_i=Epvt_i^l; Ntpt_i=Ntpt_i+l; 
Gmlat_i=Gmlat_i+l; Gilat_i=Gilat_i-^l? 
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Tp t_i=Tp t_i 1 ; 

Grit^isGrit^i-t-l; Gnnt_i=sGnnt_i*l ; 
Ovt_isOvt_i*l ; 

$display( "Termination of Fan out update, time=%d* , $tixne) ; 
end 
end 

always d(posedge Clock) 
begin 
if <e23) 
begin 
e23a0; 

Nunutis t_cnt=Nuxn_ts t_cnt- 1 ; 

if (NuxiL.tst_cnt=*0) 

begin 

e24«l; 

end 
else 

Gate.eval^init^rocsl ; 
end 
end 

always 8(negedge Clock) 
begin 
if (e24) 
begin 

$di&play ("E24 attained. End of fanout update. *) ; 

Sdisplay (* •) ; 

Inval_unit_shf t_enbl=l ; 
Shf t_cnt =Ini t_shf t_val ; 
e24aO; 
e2S»l; 
end 
end 

// lnput_val_>»nk is ♦ve edge triggered. Thus next block is -ve ^dge. 

always S(posedge Clock) 
. begin 
if (e25) 
begin 

$display(*E25 attained "); 
S>xft_ary_lb=l; 
e25=0; 
e26»l; 
end 

end 

always ^(negedge Clock) 
begin 
if (e26) 
begin 

$di8play('E26 attained 
Shf t_cnt=Shf t_cnt- 1 ; . 
if (Shft.cnt«0) 
begin 
e26=0; 

Inval_unit.shf t.enblsO ; 
e27t>l; 
end 

end 
end 

always OCposedge Clock) 
begin 

if (e27) 
begin 
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^ Shft_ary_lbsO; 
e27=0; 
e2B=l; 
end 

end 

always etnegedge Clock) 
begin 
if (e28) 
begin 
e28=0; 
e29=l; 
end 
end 

always 0(po8edge Clock) 
begin 
if (e29) 
begin 

Gate_eval_inlt j>roc=l ; 
Num_tst_cntaNunutst_seq; 

Ept_i»B'hOO; Bpvt.i=8'h00; Ntpt_i=8 • hOO ; 
Gmlat i-8'hOO; Gilat.i=8 -hOO; Tpt.i=8'h00; 
Grit_i«8'h00; Gxmt.i«8-h00; Ovt i=B'hOO; 
e29B0; ~ ' 

end 
end 



mdmodule 



wo 01/01298 



-55- 

The APPLES architecture is designed to provide a fast and flexible mechamsm for 
logic simulation. The technique of applying test patterns to an associative memcvy 
culminates in a fixed time gate processing and a flexible delay model. Multiple 
scan registers provide an effective way of parallelising the fan-out up-dating 

5 procedure. This mechanism eliminates the need for conventional paraDei 
techniques such as load balancing and deadlock avoidance or recovery. 
Consequently, parallel overheads are reduced. As more scan renters are 
introduced, the gate evaluation rate increases, ultimately being limited 1^ the 
average fan-out list size per gate and consequently the memory bandwidth of fan- 

10 out list memory. 

Refening to Fig. 8, there is illustrated an array indicated generally by the reference 
numeral 20 comprising a plurality of cells 21 , each of which comprises an APPLES 
processor as described above. A synchronisation logic control 22 is provided. The 

15 circuit that is to be simulated is split up among the APPLES processor. Gate 
evaluations are carried out independently in each processor or cell 21. Each cell 
21 is provided with a local input value register banl< and a foreign input value 
register bank to allow interconnection which is done through an interconnecting 
network 23 incorporating the synchronisation logic 22. Connections between the 

20 synchronisation logic circuit 22 which is, strictly speaking, the main synchronisation 
logic circuit, to each of the cells 21 is not shown. 

After all gate evaluations for all gate types and the corresponding updates have 
occurred, on a given processor forming a cell 21, the processor must wait for all 

25 other processors to reach the same state. When all processors reach this state 
tiien the respective input value register banks can be shifted into the respective 
aray and associative register 1b and evaluation of the next time unit can occur. 
Thus, to achieve implementation, there is required that a suitable interconnecting 
network must be designed and an interface to the APPLES processor constructed. 

30 A synchronisation method must exist to detemiine when evaluation of the next 
time unit should proceed. A system to split the hit list information amongst the 
processors is required in order to initialise the system. 

The array of processors is implemented as a torus (equivalent to a 2D mesh with 
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wrap-around) as shown in Rg. 8. The inclusion of wrap-around connections 
reduces the network dianrteter increasing the network speed. It also means that 
each processor can be identical without wasted hardware at the edges of the array. 
It does however require a more complicated routing mechanism. No set size was 
5 used for the an^ay instead the size was used as a criteria which was varied during 
simulations. This criterion was specified by a command line parameter to the 
Verilog compiler. These command line parameters are covered in detail in the next 
chapter. 

Each cell is connected to its four neighbouring cells via seriah connections. 
Obviously parallel connections would be faster. However a Virtex FPGA was used 
and it has a limited number of pins. It may happen that not all of these pins are 
available to a particular design due to the FPGA architecture. Pins are therefore a 
precious resource. Since each FPGA would require eight parallel connections (an 
input and an output connection on each of the four edges) this would require a 
large number of pins. If at a later stage it is discovered that there are spare pins 
and a parallel network is justified then the design could be altered. In this design 
each cell has a serial input and a serial output on each of its four edges. These 
serial connections each consist of a data line and two control lines. These serial 
connections will therefore require 12 pins on each Virtex FPGA. Each cell is also 
connected to the array's synchronisation logic. 

In order to design the network knowledge of the information that the network must 
carry is required. The network is required in order to pass fan out updates between 
25 processors. These updates can be passed as messages. Each message is an 
update and consists of a destination address and an update value. A single Virtex 
FPGA was used to implement an APPLES processor capable of simulating a circuit 
with approximately 256 gates. This figure is somewhat arbitrary and further design 
work will reveal the true value required. Given a restraint of 256 gates per 
30 processor approximately 64 processors would be required to simulate a reasonably 
complex circuit. This corresponded to an 8 x 8 array. Each processor will need to 
be able to send updates to any other processor updating any one of their 512 gate 
inputs. This implies an address space of six to identify the processor and an 
address space of nine to identify the wire. Each update sent also requires an 
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update value. These are three bits wide (enabling support for eight-state logic). 
Therefore messages sent from processor to processor will need to be eighteen bits 
wide. These figures are arbitrary but are a useful starting point. 

5 The structure of a cell 21 is shown in Fig. 9. Each of the four edges has a 
transmitter 25 and a receiver 26. These modules deal with the serial connections. 
The transmitter 25 takes in an eighteen-bit entity and sends it out in a bit stream. 
TTie receiver 26 takes in the bit stream and reconstitutes it into the original 
eighteen-bit message. 

10 

A request scanner 27 checks every receiver 26 and the APPi^S processor 30 
simultaneously to see if they have messages watting to be routed. It assigns each 
of these sources a rotating priority and picks the source that has a message and 
the highest priority. It then passes the picked message to a request router 28. 

15 

The request router 28 passes its messages either to the APPLES processor 30 or 
to a transmitter 25. If the option chosen is a transmitter then the message will be 
sent to a different cell 21. If the option chosen is the APPLES processor 30 then 
the message is an update for the local processor. A synchronisation logic circuit 31 
20 controls the cell 21 through the synchronisation logic circuit 22. 

In Rg. 9 every transmitter, every receiver and the input and output ports of the 
APPLES processor have buffers connected. A command line parameter to the 
Verilog compiler specifies whether these components are to be used or removed 
25 from the design. One slightly different behaviour of these buffers is that they 
process data in a UFO fashion. The effect of these buffers on performance is an 
important part of the system analysis. 

The request router 28 employs one of two different routing techniques. The 
30 technique used is detemnined by a command line parameter to the Verilog 
simulator used to implement the invention. A comparison of the routing technkjues 
is important to the understanding of the invention. Both routing techniques operate 
in a similar manner. 
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The request router 28 decodes the message. It can then determine the destination 
processor. It determines all the valid options for routing the message. The 
message could be routed to the local APPLES processor 30 or to one of the 
transmitters 25. The message is then routed to one of the valid options. 

5 

The first routing technique only produces one valid routing option and if that route is 
not blocked then the message is routed in that direction. If it is blocked then the 
request router 28 attempts to route a different message. Messages are passed 
from cell 21 to cell 21 until they reach their destination. Under this routing 

10 technique a message is passed first either in the east or west direction until it is at 
the conrect east-west location. It is then routed In the north or south direction until 
the message amves at its destination. The net result of the message passing is 
that the message travels the minimum distance. This routing strategy results In 
the traffic between any two given cells 21 always following the same route through 

1 5 the network. This routing strategy can be called standard routing. 

The second routing technique is more complicated. Under this strategy the request 
router 28 determines all of the available directions that can be taken by the 
message which will result in it travelling the shortest distance. The various options 

20 have different priorities associated with them. This priority is based on the options 
that were previously taken. This priority method helps to use the various routes 
evenly and therefore efficiently. Some of the options may not be feasible as they 
may be in use with previous messages. An option is chosen based on priority and 
availability. The priority information is then updated. This routing strategy is an 

25 advanced routing. 

For both routing techniques, when all valid paths are blocked and the request 
router 28 is unable to route its message then it simply drops the message. This is 
an important aspect to the manner in which the request scanner 27 and request 
30 router 28 work together. The request scanner 27 takes a message from one of its 
sources, it does not inform the source that it is attempting to route this message. 
The source maintains the message at its output If the request router 28 
successfully routs the message then it tells request scanner 27 that it has done so 
and the request scanner 27 informs the source. This way the request router 28 is 
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not committed to routing a particular message. The request router 28 therefore is 
always free to attempt to route messages. 

Tlie network interface 42 shares access to the input value register bank 20 
5 between the local processor and the network. The local processor gets priority. 
This module decodes the message and updates the appropriate locatton in the 
input value register bank 2. 

The network interface 42 is connected between the fan out generatoc 43 and the I 
10 Input value register bank 2. It can therefore pass fan out updates from the 
processor to the network when appropriate or simply pass them to the input value 
register bank 2. It can also pass fan out updates from the network to the input 
value register bank 2. Some changes were required in the fan out generator 43 to 
accommodate the network interface 42. 

15 

When each processor in the array has processed the fan out list for each of its 
active gates and all updates have reached their destination then each processor 
can shift its input value register bank 2 into its array lb and proceed with evaluation 
of the next time unit. In order to achieve this some synchronisation logic, between 

20 the cells 21 , is required. The implementation for this requires each processor to 
report to its cell 21 when it has completed sending updates. Each cell 21 also 
monitors the network activity and reports back to the array stating whether there is 
network activity or processor activity. The anay therefore knows when all 
processors are finished updating and when the network is empty. At such a time 

25 the anay reports back to the cells 21. Then the cells 21 tell the processors to 
proceed with the next time unit in the delay model. The implementation of this 
system required minor changes in the sequence logic of the APPLES processor. 

The network is not used to communicate this synchronisation information. Instead 
30 dedicated wires are provided. Each cell 21 has a finished input wire and a finished 
output wire. The cell 21 holds the finished output wire high when its processor has 
finished and no network activity is occurring around the ceil 21. The finished input 
wire is controlled by the array synchronisation logic. The anray holds it high when it 
detects that all the finished output wires are high at the same time, it would be 
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possible to use the network to communicate this synchronisation information. This 
would reduce ttie number of Virtex pins required by the design. IHowever the 
synchronisation logic would be more complex and require more circuitry. The 
synchronisation process would also take longer to execute. 

5 

The information pertaining to the circuit description is stored in five memories within 
an APPLES processor. Under the basic APPL£S Verilog design these memories 
are loaded from data files using the $READMEM system command. For the 
system to be implemented on a Virtex chip these memories could be loaded via a 
10 PCI interface. 

Under the APPLES array each processor evaluates part of the circuit to be 
simulated. The contents of these five memories need to be split among the 
processors in the array. The memory contents also need to be processed in order 

15 to make it compatible with the array design. Under an implementation u^g an 
array of Virtex chips this data could be loaded via a PCI bus and distributed using 
the anray network. The data would be pre-processed for the array and each 
processor would simply need to load the data into its memories. The incorporation 
into the design of a system to distribute this data is non-trivial. This project is 

20 mainly concerned with the analysis of the array design's ability to simulate circuits. 
An analysis of the arra/s initialisation system is not of paramount importance at 
this time. As a result the initialisation system was not designed. 

In order to initialise the design, to facilitate simulating drcuits, a Verilog task was 
25 written to load the memories. The single processor circuit description files are 
loaded into a global memory in the design. Each processor in the array is assigned 
a number. A processor's number is calculated by multiplying its y qo-ordinates by 
the array width and adding its x co-ordinates. Each processor loads a segment of 
the global Array 1a, Array lb, the tan out header table and the fan out size table 
30 into its local memory. These segments are of equal size. The segments chosen 
are based on their processor number. Processor zero takes the first segment, 
processor one takes the second segment and so on. A segment of the fan out 
vector table must be loaded also. The segment is determined by looking at the 
contents of the local fan out size and fan out header tables. The first address to be 
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loaded from the global fan out vector table is the address stored in the first location 
in the local fan out header table. The last address to be loaded is calculated by 
adding the address stored in the last entry in the local fan out header t^le to the 
last fan out size stored in the final entry in the local fan out size table. The 

5 addresses within the fan out header table must be adjusted to point at the new 
local fan out vector table. This is achieved by subtracting the address stored in the 
first location in the local fan out header table from each address in the same table. 
Each gate input address stored in the local fan out vector table must be converted 
into an array address. An array address consists of the destination processor's x 

10 co-ordinates stored in bits fourteen to twelve, the destination processor's y co- 
ordinates stored in bits eleven to nine and the gate input's local address on the 
destination processor stored in bits eight to ten. 

Using this system the circuit description is split among the processors. No 
15 consideration is given to decide which gate is simulated on which processor. The 
APPLES circuit description files determine where each gate is simulated. The 
layout of these files is determined by the layout of the iscas-85 net list files that 
were used to generate the APPLES circuit description files. 

20 Referring to Rg. 10, there is illustrated an alternative layout of processor in which 
parts similar to those described with reference to Rg. 1 are identified by the same 
reference numerals, tn this embodiment, the scan registers are identified by the 
reference numerals 6a and the general logic sequence is identified by the reference 
numeral 40. The processor will also include a circuit splitting logic circuit 41 and a 

25 network interface 42. A fan out generator 43 is identified and will include, for 
example, the fan out memory 8. The network interface 42 shares access to the 
input value register bank 2. 

The original APPLES design is written in Verilog. So is the array design. The 
30 Verilog code is written at a behavioural level. This is the most abstract level 
available to a Verilog programmer. As with any Verilog system it is split into Verilog 
modules. Each module is a component of the system. The Verilog modules added 
under the APPLES array design are: 
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• The Top Module 

• The Array Module 

• The Cell Module 

• The Receiver Module 

5 • The Transmitter Module 

• The Request Scanner Module 

• The Request Router Module 

• The Buffer Module 

• The Network Interface Module 

10 

The Top module is used to test that the system is performing con'eclly. An 
instantiation of the Top module contains an instantiation of the array module. The 
array contains multiple instantiations of the Cell module. Each Cell contains four 
instantiations of both thie transmitter and Receiver modules. A Cell also contains a 
15 Request Scanner, a Request Router, several buffers and an APPLES processor. 
The APPLES processor contains instantiations of the standard processor 
components along with an instantiation of the Network Interface module. This 
structure and the behaviour of these modules were described earlier in this 
chapter. Each of these modules is contained within an appropriately named file. 

20 

In addition to designing these modules the array design also required the following 
changes: 

• The Introduction of a Verilog task to split the circuit description information 
25 among the processors in the array. This is located in the APPLES processor 

module. 

• The incorporation of processor synchronisation logic into the APPLES 
processor module, the Cell module and the Array module. 

• The integration of the Network Interface module into the APPLES processor. 

30 

The APPLES architecture incorporates an alternative timing strategy which obviates 
the need for complex deadlock avoidance or recovery procedures and other 
mechanisms normally part of an event-driven simulation. The present invention has 
an overhead which is considerably less than conventional approaches and permits 
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gate evaluation to be activated in memory. The reduction in processing overheads is 
nnanrfest in improved speedup perfomnance relative to other techniques. 

A message passing mechanism inherent in the Chady-IVIIsra algorithms has been 
5 replaced by a parallel scanning mechanism. This mechanism allows the fan- 
out/update procedure to be parallelised. As clashes occur gates are effectively put 
into a waiting queue which fills up an fan-out/update pipeline. Consequently as the 
pipeline fills up(with the increase number of scan registers) » performance increases. 
The speedup reaches a limit when the new gates entering the queue equals the fan- 
10 out rate. Nevertheless, the speedup and the number of cycles per gate processed is 
considerably better than conventional approaches. The system also allows a vMe 
range of delay models. 

The bit-pattem gate evaluation mechanism in APPLES facilitates the implementation 
1 5 of simple and complex delay models as a series of parallel searches. Consequently^ 
the evaluation process Is constant in time, being perfonned in memory. Effectively, 
there is a one to one correspondence between gate and processor (the gate word 
pairs). This fine grain parallelism allows maximum parallelism in the gate evaluation 
phase. Active gates are automatically identified and their fan-out lists updated 
20 through scanning a hit-list. This scanning mechanism is analogous to 
Communication overhead In typical parallel processing architectures, however, tiiis 
scanning is amenable to parallelisation itself. Multiple scan-registers reduce the 
overhead time and enable the gate processing rate to be limited solely by the fan-out 
memory bandwidth. The substantial speedup of the logical simulation with the 
25 APPLES architecture is attained resulting in a gate processing rate of a few machine 
cycles. 

In this specification, the terms "comprise", "comprises" and "comprising" are used 
interchangeably with the tenns "include", "includes" and "including", and are to be 
30 afforded the widest possible interpretation and vice versa. 



The invention is not limited to the embodiments hereinbefore described which may be 
varied in both construction and detail within the scope of the claims. 
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CLAIMS 

1. A parallel processing method of logic simulation comprising representing 
signals on a line over a time period as a bit sequence, evaluating the output of 
5 any logic gate including an evaluation of any inherent delay by a coinparison 

between the bit sequences of its Inputs to a predetemiined seriss of bit 
patterns and in which those logic gates whose outputs have changed over the 
time period are identified during the evaluation of the gate outputs as real 
gate changes and only those real gate changes are propagated to fan out 

10 gates and in which the control of the method is carried out in an associative 

memory mechanism which stores in word form a history of gate input signals 
by compiling a hit list register of logic gate state changes and using a multiple 
response resolver forming part of the assodative memory mechanism which 
generates an address for each hit, and then scans and transfers the results 

15 on the hit list to an output register for subsequent use characterised in that 

the hit list is segmented into a plurality of separate smaller hit lists each 
connected to a separate scan register and in which each scan register is 
operated in parallel to transfer the results to tiie output register. 

20 2. A method as claimed in claim 1 in which the associative register Is divided into 
separate smaller associative sub-registers, one type of logic gate being 
allocated to each sub-register, each of which associative sub-registers has 
corresponding sub-registers connected thereto whereby gate evaluations and 
tests are carried out in parallel on each associative sub-register. 

25 

3. A method as claimed in daim 1 or 2 in which each associative sub-register is 
used to form a hit list connected to a corresponding separate scan register. 

4. A metiiod as daimed in any of claims 1 to 3 in which where the number of the 
30 one type of logic gate exceeds a predetermined number more tiian one sub- 
register is used. 



5. 



A method as daimed in any preceding claim in which the scan registers are 
controlled by exception logic using an OR gate whereby tiie scan is 
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terminated for each register on the OR gate changing state thus indicating no 
further matches. 

6. A method as claimed in daim 5 in which the scan is carried out by sequential 
5 counting through the hit list and the steps are performed of: 

checking if the bit is set indicating a hit; 

if a hit, determining the address effected by that hit; 

10 

storing the address; 
clearing the bit in the hit list; 
15 moving to the next position in the hit list; and 

repeating the above steps until the hit list is cleared. 

7. A method as claimed in any preceding claim, in which each line signal to a 
20 target logic gate is stored as a plurality of bits each representing a delay of 

one time period, the aggregate bits representing the delay between signal 
output to and reception by the target logic gate. 

8. A method as claimed in any preceding claim, in which each delay is stored as 
25 a delay word in an associative memory fonming part of the associative 

memory mechanism in which:* 

the length of the delay word is ascertained; and 

30 if the delay word width exceeds the associative register word width:- 

the number of integer multiples of the register word width contained within the 
delay word is calculated as a gate state; 
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the gate state is stored In a further state register; 



the remainder from the calculation is stored in the associative register witii 
those delay words whose widths did not exceed the associative 
5 register word width; and 



on the count of the associative register commencing: 



the state register is consulted for the delay word entered in the state register 
1 0 and tiie remainder is ignored for this count of the associative register; 

at the end of the count of the associative register, the state register is 
updated; and 

15 the count continues until the remainder represents the count still required. 

9. A method as claimed in any preceding claim in which there is^an initialisation 
phase in which: 



20 specified signal values are inputted; 

unspecified signal values are set to unknown; 



test templates are prepared defining the delay model for each logic 
25 gate; 

the input circuit is parsed to generate an equivalent circuit consisting 
of 2-input logic gates; and 



30 tiie 2-input logic gates are then configured. 

10. A method as claimed in any preceding daim in which a multi-valued logic is 
applied and in which n bits are used to represent a signal value at any 
instance in time with n being any arbitrarily chosen logic. 
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11. A method as daimed in claim 1 0 in which an 8-valued logic is used where 000 
represents logic 0, 111 represents logic 1 and 001 to 110 represent arbitrarily 
defined other signal states. 

5 

1 2. A method as claimed in daim 10 or 1 1 in which the sequence of values on a 
logic gate is stored as a bit pattern fomiing a unique word In Jhe associative 
memory mechanism. 

10 13. A method as claimed in any preceding daim in which there is stored a record 
of all values that a logic gate has acquired for the units of delay of the longest 
delay in the drcuit 

14. A parallel processing method of logic simulation comprising representing 

15 signals on a line over a time period as a bit sequence, evaluating the output of 

any logic gate including an evaluation of any inherent delay by a comparison 
between the bit sequences of its inputs to a predetemnlned series of bit 
patterns and in which those logic gates whose outputs have changed over the 
time period are identified during the evaluation of the gate outputs as real 

20 gate changes and only those real gate changes are propagated to fan out 

gates and in which the control of the method is carried out in^an assodative 
memory mechanism which stores in word fomn a history of gate input signals 
by compiling a hit list register of logic gate state changes and using a multiple 
response resolver forming part of the associative memory mechanism which 

25 generates an address for each hit. and then scans and transfers the results 

on the hit list to an output register for subsequent use characterised in that the 
associative register is divided into separate smaller associative sub-registers, 
one type of logic gate being allocated to each associative sub-register, each 
of which assodative sub-registers has con-esponding sub-registers connected 

30 thereto whereby gate evaluations and tests are earned out in parallel on each 

associative sub-register. 



15. 



A method as claimed in claim 1 in which the hit list is segmented into a 
plurality of separate smaller hit lists conresponding to each associative sub- 
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register each smaller hit list is connected to a separate scan register and in 
which each scan register is operated in parallel to transfer the resuHs to the 
output register. 

5 16. A method as claimed in claim 1 4 or 1 5 in which where the number of the one 
type of logic exceeds a predetermined number more than one sub-register is 
used. 

17. A method as claimed in claim 16 in which the scan registers are controlled k)y 
10 exception logic using an OR gate whereby the scan is tenminated for each 

register on the OR gate changing state thus indicating no further matches. 

1 8. A method as claimed In daim 1 7 in which the scan is carried out by sequential 
counting through the hit list and the steps are performed of. 

15 

checking if the bit is set indicating a hit; 
if a hit, determining the address effected by that hit; 
20 storing the address; 

clearing the bit in the hit list; 
moving to the next position in the hit list; and 
repeating the above steps until the hit list is cleared. 



25 



19. A method as claimed in any of claims 14 to IB Jn which each iine signal to a 
target logic gate is stored as a plurality of bits each representing a delay of 
30 one time period, the aggregate bits representing the delay between signal 

output to and reception by the target logic gate. 



20. 



A method as claimed in any of claims 14 to 19 Jn which each delay is stored 
as a delay word in an associative memory forming part of the associative 
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memory mechanism in which:- 

the length of the delay word is ascertained; and 

5 if the delay word width exceeds the associative register word width:- 

the number of integer multiples of the register word width contained 
within the delay word is calculated as a gate state; 

10 the gate state is stored in a further state register; 

the remainder from the calculation is stored in the associative register 
with those delay words whose widths did not exceed the assodative 
register word width; and 



15 



30 



on the count of the associative register commendng:- 



the state register is consulted for the delay word entered in the state 
register and the remainder is ignored for this count of the associative 
20 register; 

at the end of the count of the associative register, the state register is 
updated; and 

25 the count continues until the remainder represents the count still 

required. 

21. A method as daimed in any of daims 14 to 20 in which there is an 
initialisation phase in which: 



specified signal values are inputted; 



unspecified signal values are set to unknown; 
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test templates are prepared defining the delay model for each logic 
gate; 

the input drcuit Is parsed to generate an equivalent circuit consisting 
5 of 2Hnput logic gates; and 

the 2-input logic gates are then configured. 

22. A method as claimed in any of claims 14 to 21 in which a multi-valued logic is 
applied and in which n bits are used to represent a signal value at any 
instance in time with n being any arbitrarily chosen logic. 

23. A method as claimed in claim 22 in which an 6-valued logic is used where 000 
represents logic 0. 1 1 1 represents logic 1 and 001 to 1 10 represent arbitrarily 
defined other signal states. 



10 



15 



24. A method as claimed in claim 22 or 23 in which the sequence of values on a 
logic gate is stored as a bit pattern forming a unique word in the associative 
memory mechanism. 

20 

25. A method as claimed in any of claims 14 to 24 in which there is stored a 
record of all values that a logic gate has acquired for the units of delay of the 
longest delay in the circuit. 

25 26. A parallel processing method of logic simulation comprising representing 
signals on a line over a time period as a bit sequence, evaluating the output of 
any logic gate by a comparison between the bit sequences of its inputs to a 
predetemnined series of bit patterns and in which those logic gates whose 
outputs have changed over the time period are identified during the evaluation 

30 of the gate outputs as real gate changes and only tiiose real gate changes 

are propagated to fan out gates and in which the corvtrol of the method is 
carried out in an associative memory mechanism which stores in word form a 
history of gate input signals by compiling a hit list register of logic gate state 
changes and using a multiple response resolver fomning part of tiie 
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associative memory mechanism whicli generates an address for each hit. and 
then scans and transfers the results on the hit list to an output register for 
subsequent use characterised in that each line signal to a target logic gate is 
stored as a plurality of bits each representing a delay of one time period, the 
5 aggregate bits representing the delay between signal output to and reception 

by the target logic gate and in which the inherent delay of each logic gate is 
represented in the same manner. 

27. A method as claimed in daim 26, in which each delay is stored as a delay 
10 word in an associative memory forming part of the associative memory 

mechanism in which:- 

the length of the delay word is ascertained; and 

15 if the delay word width exceeds the associative register word width:- 

the number of integer multiples of the register word width contained 
within the delay word is calculated as a gate state; 

20 the gate state is stored in a further state register; 

the remainder from the calculation is stored in the associative register 
Vi4th those delay words whose widths did not exceed the associative 
register word width; and 



25 



on the count of the associative register commencing:- 



the state register is consulted for the delay word entered in the stete 
register and the remainder is ignored for this count of the associative 
30 register; 



at the end of the count of the associative register, the stete register is 
updated; and 
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the cx)unt continues until the remainder represents the count still 
required. 



28. A method as claimed in claim 26 or 27, in which the hit list is segmented into 
5 a plurality of separate smaller hit lists each connected to a separate scan 

register and in which each scan register is operated in parallel to transfer the 
results to the output register. 

29. A method as claimed in any of claims 26 to 28, in which the scan re^sters are 
10 controlled by exception logic using an OR gate whereby the scan is 

tenninated for each register on the OR gate changing state thus indicating no 
further matches. 

30. A method as claimed in claim 29 in which the scan is carried out by sequential 
15 counting through the hit list and the steps are perfomied of: 



checking if the bit is set indicating a hit; 

if a hit, determining the address effected by that hit; 

20 

storing the address; 

clearing the bit in the hit list, 

25 moving to the next position in the hit list; and 

repeating the above steps until the hit list is cleared. 

31. A method as claimed in any of claims 26 to 30 in which there is an 
30 ^ initialisation phase in which: 

specified signal values are inputted; 

unspecified signal values are set to unknown; 
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test templates are prepared defining the delay model for ea^ logic 
gate; 

5 the input circuit Is parsed to generate an equivalent circuit consisting 

of 2-input logic gates; and 

the 2-tnput logic gates are then configured. 

10 32. A method as claimed in any of claims 26 to 31 in which a multi-valued logic is 
applied and in which n bits are used to represent a signal value at any 
instance in time with n being any arbitrarily chosen logic. 

33. A method as claimed in claim 32 in which an 8-valued logic is used where 000 
15 represents logic 0, 1 1 1 represents logic 1 and 001 to 1 10 represent arbitrarily 

defined other signal states. 

34. A method as claimed in claim 32 or 33 in which the sequence of values on a 
logic gate is stored as a bit pattern forming a unique word in the associative 

20 memory mechanism. 

35. A method as claimed in any of claims 26 to 34 in which there is stored a 
record of all values that a logic gate has acquired for the units of delay of the 
longest delay in the circuit. 

25 

36. A parallel processor for logic event simulation (APPLES) comprising:- 

a main processor; 

30 an associative memory mechanism including a response resolver; 



characterised in that the associative memory mechanism comprises:- 
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a plurality of separate associative sub-registers each for 
the storage in word form of a history of gate input signals 
for a specified type of logic gate; and 

a plurality of separate additional sut>-registers associated with each 
associative sut>-register whereby gate evaluations and teste can be 
can'ied out in parallel on each associative sub-register." 

37. A processor as claimed in daim 36, in which the additional sub-registers 
comprise an input sub-register, a mask sub-register and a scan sub^ister. 



38. 



A processor as claimed in daim 37, in which the scan sub-re^sters are 
connected to an output register. 
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